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THERMOSTABLE PHOSPHATASES 

This invention relates to newly identified 
polynucleotides, polypeptides encoded by 'such 

^polynucleotides , the use of such p oly n ucleotides and 

polypeptides, as well as the production and isolation of such 
polynucleotides and polypeptides. More particularly,.' the 
polynucleotides and polypeptides of the present invention 
have been identified, as- thermostable alkaline phosphatases. 

BACKGROUND OF THE INVENTION 
Phosphatases are a group of enzymes that remove 
phosphate groups from organophosphate ester compounds . There 
. are numerous phosphatases, including alkaline phosphatases, 
phosphodiesterases and phytases. 

Alkaline phosphatases are widely distributed enzymes and 
are composed of a group of enzymes which hydrolyze organic 
phosphate ester bonds at alkaline pH. 

Phosphodiesterases are capable of. hydrolyzing nucleic 
acids by. hydrolyzing the phosphodiester bridges of DNA and 
RNA. The- classification of phosphodiesterases depends upon 
which side of the phosphodiester bridge is attacked. The 3' 
enzymes specifically hydrolyze the ester linkage between the 

3' carbon and the phosphoric group whereas the 5' enzymes 

hydrolyze the ester linkage between the phosphoric group and 
the 5' carbon of the phosphodiester bridge. The best known 
' of the class 3' -enzymes is a phosphodiesterase from the: venom 
of the rattlesnake or from a rustle's viper, which hydrolyses 
•/ all the -3' -bonds in either RNA or DNA liberating nearly all 
the nucleotide units as nucleotide. 5 ' phosphates. /This 
enzyme requires a free 3' hydroxyl group on the terminal, 
nucleotide residue and proceeds stepwise from that end of the 
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polynucleotide chain. This enzyme and all other nucleases 
which attack, only at the ends of the polynucleotide chains 
are called exonucleases . . The 5' en'zymes are represented by 
a phosphodiesterase from bovine spleen, also an exonuclease, 
'which hydrolyses all the .5' linkages, of both DNA and RNA' and 
thus liberates only nucleoside 3 ' 1 phosphates . • It. begins its 
attack at the end of the chain having a free 3' hydroxy 1 
group. 

Phytases are enzymes which recently have been introduced 
to f commerce. The phytase - enzyme removes phosphate from 
-phytic acid (inositol hexaphosphoric acid) , a compound found 
in plants such as corn, wheat and rice. The enzyme has 
commercial use for the treatment of .animal feed, making the 
inositol of the phytic acid available for animal nutrition. 
Aspergillus ficuuxn and wheat are sources of -phytase. 
( Business Communications Co., Inc. , 25 Van Zant Street, 
Norwalk, CT 0685 5) . ' > ' 

Phytase is used to improve' the utilization of natural 
phosphorus in animal feed., Use of phytase as a feed additive 
enables the animal to metabolize a larger- degree of , its 
cereal ' feed' s- natural ' mineral content thereby reducing or 
altogether eliminating the need for synthetic- phosphorus 
additives. More,, important than the' reduced need ,fbr- 
phosphorus additives is the corresponding reduction of 
phosphorus in pig and chicken waste. Many European countries 
severely limit the amount of manure that ■ can be spread per 
acre- due to concerns regarding phosphorus contamination of 
ground water . ' This is- highly", important in northern. Europe , 
and will eventually be .regulated • throughout the- remainder of 
the' European Continent and the United States . as well: 
{Excerpts from Business Trend Analysts, Inc January 1994, 
Frost and Sullivan Report 1995 and USDA on-line information.) 
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Alkaline phosphatase hydrolyzes monophosphate esters, 
releasing . an organic phosphate and the cognate alcohol 
compound. It is non-specific with respect to' the alcohol 
moiety and.it is this feature which accounts for the many, 
uses of this enzyme. The enzyme has a pH optimum between 9 
and 10., however, it can also function at neutral /pH, (study 
-n.f-r.hp-pn7.pft industry c onducte d by Business Communication s 
Company, Inc., 25 Van Zant Street, Norwalk, Connecticut 
06855, 1995. ) . 

• Thermostable alkaline phosphatases are not irreversibly 
inactivated even when heated to 60°C or more for brief 
periods of time, as, for example, in the practice of 
hydrolyzing monophosphate esters. 

Alkaline' phosphatases may be obtained from numerous 
thermophilic organisms, such as Ammonifex degensii, Aquifex 
pyrophilus , * Archaeoglobus lithotrophicus , Methanococcus 
igneus, Pyrolobusia Crenarchaeota) , Pyrocbccus and 
Thermococcus , which are mostly Eubacteria and Euryarchaeota . ■ 
Many of these organisms grow at temperatures ; up to about 
.103 °C and are unable to grow below 70°C. These anaerobes are 
isolated from extreme environments. . For example, 
Thermococcus CL-2 was isolated from a worm residing on a 
"black smoker" sulfite structure. 

Interest in alkaline phosphatases from thermophilic 
-microbes has _ increased recently due to their value for. 
commercial applications. Two sources of alkaline 

phosphatases dominate and compete commercially: (i)' animal, 
from bovine and calf intestinal mucosa, and (ii) bacterial, 
from E . coli. Due to the high turnover number of calf 
intestinal phosphatase, it is often selected as the label in 
many enzyme immunoassays. The usefulness of calf alkaline 
phosphatase, however, is limited by its inherently low 
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thermostability > ' which is even further compromised .during the 
chemical preparation of the enzyme: antibody conjugates. 
Bacterial alkaline phosphatase is an alternative to calf 
alkaline phosphatase due to bacterial alkaline phosphatase's 
extreme thermotolerance at temperatures as high '. as 95°C 
(Tomazic-Allen, S.J., Recombinant Bacterial Phosphatase ' as an 
I mmunodiagnostic , Enzyme, Annals - D Biology, Clinique, 
49 (5)' : 287-90 (1991), however, the enzyme has a very low 
turnover, number . ■ 

There is a need for novel phosphatase enzymes having 
enhanced "thermostability: This includes a need, for 
thermostable alkaline phosphatases .'whose enhanced 
thermostability is beneficial in enzyme labeling processes 
and certain recombinant DNA techniques, such as in the 
dephosphorylation of vector DNA prior to insert DNA ligation. 
Recombinant phosphatase enzymes provide the proteins in a 
format amenable, to efficient production of pure enzyme'-, which 
can be. utilized in a variety of applications as described 
herein. ! Accordingly, . there is a need for the 
characterization, amino acid sequencing, DNA sequencing, and 
heterologous expression of thermostable phosphatase enzymes . 
The present invention meets these need by providing DNA and 
amino acid sequence information and exprssion and 
purification protocol for thermostable phosphatase derived 
from several organisms. 

SUMMARY OF THE INVENTION 
The present invention provides thermostable phosphatases 
from, several organisms. 1 In accordance with one aspect of 
. the ' present invention, there are' provided novel enzymes / as 
well as active fragments , analogs and derivatives thereof. 

In accordance with another aspect of the present 
invention, there are provided isolated nucleic acid molecules 
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encoding the enzymes of the present invention, including 
mRNAs, cDNAs , genomic DNAs , as well as active analogs and 
fragments of such nucleic acids.. 

■In accordance with another aspect of the present 
invention, there are provided isolated nucleic acid molecules 
■ encoding mature enzymes expressed by the DN A contained in the 
" plasmid""DNA~vector deposited with the ATCC as Deposit No. 
97536 on May 10, 1996. 

In accordance with a further aspect of the present 
invention, there is provided a process for producing such 
polypeptides by recombinant techniques comprising culturing 
recombinant prokaryotic and/or eukaryotic host cells, 
containing a' nucleic acid sequence of the present invention, 
under conditions promoting expression of said enzymes and 
subsequent recovery of said enzymes . 

In accordance with yet a further aspect of the present 
invention, there is provided a process for utilizing such 
enzymes for hydrolyzing monophosphate ester bonds, as an 
enzyme label in immunoassays, for removing 5' phosphate prior 
to end- labeling, and for dephosphorylating vectors prior to 
insert ligation. 

In accordance with yet a further aspect of the present 
invention, there -are also provided nucleic acid probes 
comprising nucleic acid molecules .of sufficient length to 
specifically "hybridize to" a nucleic acid sequence . of ■ the ■ 
present invention. 

In accordance with yet a- further aspect of the present 
invention, there is provided a process for utilizing such 
enzymes, or polynucleotides encoding such enzymes, for in 
vitro purposes related to scientific research, for example, 
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to generate probes for. identifying similar sequences which 
might encode similar enzymes, from other organisms ; by using 
certain regions, i.e., conserved sequence regions of the 
: nucleotide sequence. 

These and other aspects of the present invention will be 
apparent to those of skill in the ' art from the teachings 
herein. ' • 



BPTRF DESrRTPTION. O F THF DRAWINGS . 

The following drawings are illustrative of embodiments 
of the invention and are not meant to limit the scope of. the 
invention as encompassed by the claims. 

Figure I ' is an illustration of the full -length' DNA and 
corresponding deduced amino acid • sequence of Awnonifex 
degensii KC4 of the present invention. Sequencing was 
performed, using a 378 automated DNA sequence- for all 
sequences of the present. invention (Applied Siosystems, Inc.,. 
'Foster City, California). 

Figure 2 is an illustration of the full-length DNA and 
corresponding deduced amino acid .sequence of Methanococcus 
igneus Kol5 . ' • 

Figure 3 is an illustration of the full - length DNA and 
corresponding deduced amino acid sequence of Thermococcus 
alcaliphilus AEDII12RA. ' • '■ ' 

Figure '4 is an illustration of the full-length DNA and 
' corresponding ' deduced ' amino- acid sequence of Thermococcus 

' ■ celer.. ' ' 
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Figure'5 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of Thermococcus 
GU5L5 . 

Figure 6 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of 0C9a. . 

Figure 7 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of M11TL . 

Figure 8 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of Thermococcus 
CL-2. 

Figure 9 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of Aquifex VF-5. 

DETAILED DESCRIPTION OF THE INVENTION * 

To facilitate understanding of the invention, a number 
of terms are defined below. 

The term "isolated" means altered "by the hand of man" 
from its natural state;, i.e., if it occurs in nature, it has 
been changed or removed from its original environment, or 
both. For example, a naturally occurring polynucleotide or 
.a. polypeptide naturally present . . in a. living .animal. . in its 
natural state is not "isolated", but the same polynucleotide 
or polypeptide separated from the coexisting materials of its 
natural state is "isolated", as the term is employed herein. 
For example, with respect to polynucleotides, the term 
isolated means that it is separated from the nucleic acid and 
cell in which it naturally occurs. 
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As part of or following isolation, such polynucleotides 
can be joined to other polynucleotides, such as DNAs, for 
mutagenesis, to form fusion proteins, and for propagation or 
expression in a host, for instance. The isolated 

polynucleotides, alone or joined to other polynucleotides 
such as vectors, can be - introduced into host cells, in 

cul-ture- -or— i-n— whole-organisms Introduced„into„_ho.s.t_c.ells_in 

culture or in whole organisms, such polynucleotides still 
would be isolated, as the term is used herein, because they 
would not be in their naturally occurring form or 
environment. Similarly, the polynucleotides and polypeptides 
may occur in a composition, such as a media, formulation 
(solutions for introduction of polynucleotides or 
polypeptides, for example, into cells or compositions or 
solutions for chemical or enzymatic . react ions which are not 
naturally occurring compositions) and, therein" remain 
isolated polynucleotides or polypeptides within the meaning 
of that term as it is employed herein. 

The term "ligation" refers to the process of .forming 
phosphodiester bonds between two or more polynucleotides, 
which most often are double stranded DNAs. Techniques for 
ligation are well known to the art and protocols ' for ligation 
are described in standard laboratory manuals and references, 
such as, for instance, Sambrook et al . , MOLECULAR CLONING, A. 
LABORATORY MANUAL, 2nd Ed.; Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, New York (1989) . 



The term "oligonucleotide" as used herein is. defined as 
a molecule comprised of two or more deoxyribonucleotides or 
ribonucleotides, preferably more than three, and usually more 
than ten. The exact size of- an oligonucleotide will depend 
on many factors, including the ultimate function or use of 
the oligonucleotide. .Oligonucleotides can be prepared by any 
suitable method, Including-, for example, cloning and 
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restriction of appropriate sequences, and direct chemical 
synthesis by a method^ such as the phosphotriester 'method of- 
Narang et al., 1979, Meth. Enzymol . , 68 : 90 - 99 ; - the 

< phosphodiester method of -Brown et al., 1979, Method . Enzymol.. , 
68:109-15.1, the diethylphosphoramidi te method*of. Beaucage et 

■ al. , 1981, Tetrahedron Lett., 22 : 1859-1362 ; the triester 
method ' of Matteucci et al,,' 1981, J". Am. Chem.. ■ Soc. i( 
103 :3185-3191, or automated synthesis methods-/ and- the solid 

•support method . of U.S. Patent No,. 4,458,066. 

The term "plasmids" generally is designated herein -by a' 
lower case * p preceded and/or followed* .by capital letters 
and/or 'numbers,' in accordance with standard naming 
conventions that are familiar to those of skill in the art.' ^ 

Plasmids disclosed herein are ( either commercially 
available, publicly available on an unrestricted .basis, or 
can be constructed from available plasmids by routine 
application of well known, published . procedures , Many 
plasmids and other cloning and expression vectors that can be ■ 
used in accordance, with the present invention ' are well known 
and readily available to those of skill in the, art. 
Moreover,, those of skill readily may construct any .number of * 
other "plasmids suitable < for use' in the invention.. The ' 
properties, construction and use of . such plasmids, vas well as 
other vectors, in the present invention will , be .'.readily 
apparent to those of skill from the present disclosure. 

The. term "polynucleotide (s) " generally refers to- any 
polyribonucleotide or polydeoxyribonucleotide, which may be 
unmodified RNA or • DNA or modified RNA . or DNA. Thus, for 
instance, polynucleotide's, as, used herein refers to, among 
others', single-and double -stranded DNA, DNA that is ,a mixture 
of single-and double -stranded regions, .single- and double- 
stranded RNA, and RNA that is mixture' of*, single- - and double- 
stranded regions, hybrid 'molecules comprising DNA and RNA 
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that may be single-stranded or, more typically, double.- 
. stranded or a mixture of- single- and double -stranded regions. 

In addition, polynucleotide as used herein refers to 
triple- stranded regions comprising RNA or DNA or both RNA and 
DNA. The strands in such regions may be from, the same 
molecule or from different molecules. ' The regions may 
i"nel~ude^al4~of— one— or — more — of— the — mol-ecules-,— but^-more- 
typically involve only a region of some *of - the molecules. 
One of the molecules of a triple-helical -region often is an 
oligonucleotide. 

As used herein, the term polynucleotide includes DNAs ^or 
RNAs as described above that contain one or more modified 
bases. Thus, DNAs or RNAs with backbones modified for 
stability or for other reasons are "polynucleotides" as that 
term is intended herein. Moreover, DNAs or RNAs comprising 
unusual bases, such as inosine, or modified bases, such as 
tritylated bases, to name just two examples, are 
polynucleotides as the term is used herein. 

It will be appreciated that . a great variety of 
modifications have been made to DNA and RNA that., serve many 
useful purposes known to those of -skill in the art. The term 
polynucleotide as it is employed herein embraces such 
chemically, enzymatically or metabolically modified forms of 
polynucleotides, as well as the chemical forms of DNA and RNA 
characteristic of viruses and cells, including simple and 
complex-cells,- .inter.-alia.^ :._._;.'_.„- ... 

. The term "primer" as used herein refers to an 
oligonucleotide, whether natural or synthetic, which is 
capable of acting as a point of initiation- of ■ synthesis when, 
.placed under conditions in which primer extension is 
initiated or possible. Synthesis of a primer extension 
product which is: complementary to a nucleic acid strand is. 
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initiated in "the presence of nucleoside triphosphates and a ' 
.polymerase m an appropriate buffer at a suitable 
temperature. 

The term "primer" may refer to more than one primer, 
particularly in the case where there is some ambiguity in the 
information regarding one or both ends -of the target region 
to be synthesized. For instance,- if a nucleic acid sequence 
is inferred from a protein sequence, a "primer" generated to 
synthesize nucleic acid encoding said protein sequence is 
actually a collection of primer oligonucleotides containing 
sequences, representing all possible codon variations based on 
the degeneracy of the genetic code. One or more of the 
primers in this collection will be homologous with the end of 
the target sequence. Likewise, if a "conserved" region shows 
significant levels of . polymorphism in a population, mixtures 
of primers can be prepared that will amplify adjacent 
sequences. 

The term "restriction endonucleases" and "restriction 
enzymes" refers to bacterial enzymes which cut double- , 
stranded DNA at or near a specific nucleotide sequence. 

The term "gene" means the segment of DNA involved in . 

. producing a polypeptide chain; 
it includes regions, preceding, and following the coding region 
(leader and trailer) as well as intervening sequences 
(introns) between individual coding segments (exons) . 

A coding sequence is "operably linked" to another coding 
sequence when RNA polymerase will transcribe the two coding 
sequences into a single mRNA, which is then translated into 
a single polypeptide having amino acids derived from both- 
codina sequences. ■ The. coding -sequences need not be- 
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contiguous to one another so long as the expressed sequences 
ultimately process to produce the .desired protein.. 

'"Recombinant enzymes ref er to enzymes produced by 
recombinant DNA techniques; i.e., produced from cells 
transformed by an exogenous DNA construct encoding the 

desired_enzyme ILSynthetici! enzymes— are—those -p-r-epa-r-ed— by 

chemical synthesis. 

A DNA "coding sequence of" or a "nucleotide sequence 
encoding" a particular enzyme, is a DNA sequence which is 
"transcribed and translated into an enzyme when placed under 
the control of appropriate regulatory sequences. 

The term "thermostable phosphatase" refers to an enzyme 
which is stable to heat and heat-resistant and catalyzes the 
removal of phosphate groups from organophosphate ester 
compounds. Reference to "thermostable phosphatases" includes 
alkaline phosphatases, phosphodiesterases and phytases . 

The phosphatase enzymes of the present invention cannot 
become irreversibly denatured (inactivated) when subjected to 
the elevated temperatures for the. time necessary to effect 
■the hydrolysis of a phosphate group from an organophosphate 
ester compound. Irreversible denaturation for 'purposes 
herein refers to permanent and complete loss of enzymatic 
activity. The phosphatase enzymes do not become irreversibly 
denatured from exposure to temperatures of .,a. range . from about . 
60°C to about 113°C or more. The extreme thermostability of 
the phosphatase enzymes provides additional advantages over 
previously characterized thermostable enzymes. Prior to the 
present invention, efficient hydrolysis of phosphate groups 
at temperatures as high as .100 °C has not been demonstrated. 
No thermostable phosphatase has been described for this 
purpose . 
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In accordance with an aspect of the. present invention, 
there are provided isolated 'nucleic acids (polynucleotides) 
which encode for the mature enzymes having the deduced amino 
acid sequences of Figures. 1-9 (SEQ ID NOS:28-36>. 

in accordance with another aspect of. the . present 
invention, there are., provided isolated polynucleotides 
encoding the enzymes of the present invention. . The -deposited 
material is a mixture of genomic clones comprising DNA 
encoding an enzyme of the present invention. Each genomic 
clone comprising the respective DNA has. been inserted into a 
pBluescript vector (Stratagene, La Jolla, CA). . The deposit 
has been deposited with the American Type- Culture Collection, 
12301 Parklawn Drive, Rockville, Maryland 20852, USA, on May 
10, 1996 and assigned ATCC .Deposit No. 97536. 

The deposit (s) have been made under the terms of the 
Budapest Treaty on the International Recognition of the 
deposit of micro-organisms for purposes of patent procedure. 
.The strains will be irrevocably and without restriction or 
condition released tc the public upon the issuance of a 
patent. These deposits are provided merely as convenience to 
those of skill in the art and are not an admission that a 
deposit be required under 35 U.S. C. §112. The sequences 6f :> 
the polynucleotides contained in the deposited materials, as 
well as the amino acid sequences of the polypeptides -encoded 
thereby, are controlling in the event of any conflict with 
any description of sequences herein.. A- license may be 
• required to make, use or' sell the deposited materials, .and no 
such license is hereby granted. 

The polynucleotides of this, invention were originally 
recovered from genomic gene libraries derived from the 
following organisms: 
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Ammonifex degensii KC4 is a eubacteria from the genus 
Ammonifex. It was isolated in Java, Indonesia. It is a 
gram-negative, chemoli thoautotroph . ■ It grows optimally at 
70°C in a low-salt culture-medium at pH 7 with 0.2% nitrate 
as a substrate and H 2 /C0 2 in gas phase'. 

Methanococcus~i'gneus—KOh5-~, is— a— Eu-r-ya-rchaeota— isol-ated-; 

from Kolbeirisey Ridge in the north of Iceland. It grows 
optimally at 85°C and pH 7 . 0 in a high-salt marine medium 
with H 2 /C0 2 in a gas- phase. Aquifex pyrophilus KOL 5A is a 
marine' bacteria isolated from th Kolbeinsey Ridge in the 
north of Iceland. It is a gram-negative, rod- shaped, 
strictly chemoli thoautotrophic , knall gas bacterium, and a 
denitrifier. It grows optimally at 85°C in high-salt marine 
medium at pH 6.8 with 0 2 as a substrate and H 2 /C0 2 + 0.5% 0 2 in 
gas phase . 

Thermococcus alcaliphilus AEDII12RA is from the genus V 
Thermococcus. AEDII12RA grows optimally at 85°C, pK 9.5 in ' 3 
a high salt medium (marine) containing polysulfides and yeast 
extract as substrates and N 2 in gas phase. 

Thermococcus celer is an Euryarchaeota . It grows 
optimally -at 85°C and pH 6 . 0 in a high-salt marine medium * 
containing elemental sulfur, yeast extract, and peptone as 
substrates and N 2 in gas phase. 

- Thermococcus GU5L5 is an Euryarchaeota isolated from the 

Guaymas Basin in Mexico,. It grows optimally at 85°C and pH 
6.0 in a high-salt marine medium containing 1% elemental - 
sulfur, 0.4% yeast extract, and 0.5% peptone as substrates . 
with N 2 in gas phase. 

OCSa-27A3A is a bacteria of unknown etilogy obtained 
from Yellowstone National Park and maintained as a pure • 
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culture. It-grows well on a TK6 medium and has cellulose 
degrader activity. Further/ it * codes for.- an alkaline 
phosphatase, having greater than 50% polypeptide identity and. 
greater than 32% polynucleotide identity to each of Bombyx 
mori and -Escherichia coll C alkaline phosphatase precursors, 
which' is significant homoloygy. Thus, it is expectged that 
OCSa- 2 7A3 A -can be cloned and expressed readily in Escherichi 
Coli, C in place of its' native alkaline phosphatase .precursor. 

Mil TL is a new species of Desulfuroccccus isolated from 
Diamond Pool in' Yellowstone National ?ark: M11TL grows 
heterotrophically by fermentation of different organic 
materials (sulfur is not necessary) and forms grape- like 
aggregates. The organism grows optimally at 85°C to 88°C and 
pH" 7.6 in a low salt medium containing yeast extract', 
peptone, and gelatin as substrates with an N 2 /C0 2 gas phase. 

Thermoccccus CL-2 is an Euryarchaeota isolated from the 
North c i e ft Segment in the Juan de Fuc* Ridge. It grows 
optimally at 88°C in a salt medium with an argon atmosphere. 

Aguifex VF-5 is a marine bacteria- isolated from a beach 
in Vulcano, Italy. It . is a gram-negative, rod-shaped, 
strictly chemolithoautotrophic , knall gas bacterium. . It 
grows optimally from 85-90°C in hi,gh-salt marine medium at pH 
6.8., with 0 2 as a substrate and-H 2 /C0 2 . + 0..5.% 0 2 in ^ gas, phase. 

■ ■ ■ 

Accordingly, Che polynucleotides • and enzymes encoded 
thereby, are identified by the organism • from which they were 
isolated, and are sometimes hereinafter referred to as "KC4" 
(Figure 1 and SEQ ID NOS : 19 and 28) , "KolS" (Figure. 2 and SEQ 
ID NOS : 20 and 29), "AEDII12RA" (Figure 3 and SEQ ID NOS: 21 ' 
and 30) , "Celer" (Figure 4 and SEQ ID NOS-.22 and 31) , "GU5L5 " 
(Figure 5 and SEQ ID NOS:23 and 32), "0C9a» (Figure 6 and SEQ 
ID NOS-.24 and 33), "M11TL" (Figure 7 and SEQ ID NOS:25 and 
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34) , "CL-2" (Figure 8 and SEQ ID NOS:26 and 35) and. »yF-5" 
(Figure 9 and * SEQ ID NOS:27'and 3 6)-. - ' 

The polynucleotides and polypeptides of the present 
invention. show identity of the nucleotide and protein level 
to known genes "and proteins- encoded thereby as . shown in 

Table-lT--- — ~ — ' — — - " — — ~— 



-16- 



WO 97/48416 



PCT7US97/10784 



Table 1 



Clone 


Gene/Protein with 
Closest Homology 


Protein 
, Identity. 


Nucleic 
Acid - 
laeniiry 


Ammonifex degensiii 
KC4-3A1A' 


Yarrowia iipolytica. Candida Iipolytica, 
acid phosphatase ■ 


47% 


24% 


Ammonifex degensii 
KC4-3A1A 


Saccharomyces cerevisiae, hypothetical 
protein YBR094w 


54% 


26% 


Methanococcus igeneus 
Kol5-9AlA 


Yarrowia iipolytica, Candida Iipolytica. 
acid phosphatase 


, 45% 


25% 


Methanococcus igeneus 
K015-9A1A 


Saccharomyces cerevisiae, hypothetical 
protein YBR094w, hypothetical protein 
YBR0821 


52% 


25% 


Thermococcus alcaliphilus 
AEDII12RA-18A 


No homology found 






Tlxermococus celer 25A1A 


No homology found 






Thermococcus GU5L5- 
26A1A 


Bacillius subtilis, alkaline phosphatase 
IV precursor, alkaline 
phosphomonoesterase, 
glycerophosphatase, and 
phosphomonoesterase, 


5&7o 


38% 


Jliermococcus GU5L5- 
26A1A 


Bacillius subtilis, alkaline phosphatase 
III precursor 


. 58% 


37% 


OC9a-27A3A 


Bombyx mori (silkworm), alkaline 
phosphatase precursor 


54% 


33%' 


OC9a - 27A3A 


Escherichia coli C, alkaline', 
phosphatase precursor 


53%, 


34% 


Mil TL - 29A1A 


Rhodobacter capsulars* hypothetical 
protein B 


. 43% ' 


24% ' 


Thermococcus C12-30A1A 


Yarrowia Iipolytica. Candida Iipolytica, 
acid phosphatase 


"49% 


•27% 


Thermococcus CL2-30A1A 


Saccharomyces cerevisiae, hypothetical 
protein YBR094w hypothetical protein , 
YBR0821 


' . • 50% 


25%., 


Aquifex VF5 : 34A1A 


Escherichia coli, suppressor protein 
suhB 


57% 


34% 
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All of the clones identified in Table 1 encode 
polypeptides which have phosphatase activity. 

One means for isolating the nucleic acid molecules 
encoding the enzymes of the present invention is to probe a 
gene library with a natural or artificially designed probe 
using art recognized procedures (see, for example: Current 
Protocols in Molecular Biology, Ausubel F.M. et al . (EDS.) 
Green Publishing Company Assoc. and John Wiley Interscience , 
New York, 1.989, . 1992) . It is appreciated by one skilled in 
the art . that ' the polynucleotides of SEQ ID NOS : 1-18, or 
fragments thereof (comprising at least 12 contiguous 
nucleotides), are particularly useful probes. Other 
particularly useful probes for this purpose are hybridizable 
fragments of the sequences of SEQ ID NOS: 19-2*7 (i.e., 
comprising at least 12 contiguous nucleotides). 

With respect to nucleic acid sequences which hybridize r 
to specific ' nucleic acid sequences disclosed herein, ^ 
hybridization may be carried out under conditions of reduced y 
stringency, medium stringency or even .stringent, conditions. 
As an example of oligonucleotide- hybridization, a polymer ^ 
membrane containing immobilized denatured nucleic acids is 
first prehybridized for 30 minutes at 45°C in a solution ^ 
consisting of 0.9 M NaCl, 50 mM NaH 2 PO<, pH 7.0, 5.0 mM 
Na 2 EDTA, 0.5% SDS , 10X Denhardt's, and 0.5 mg/mL 
polyriboadenylic acid. Approximately 2 X 10 7 cpm {specific 
activity 4-9 X To' 8 ' cpm/ug) of 32 P end- labeled" olTgohucTeotide 
probe are then added to the solution. After 12-16 h -stirs of 
incubation, the membrane is washed for 30 minutes at* room 
temperature in IX SET (150 mM NaCl, 20 mM Tris hydrochloride, 
pH 7.8, 1 mM Na 2 EDTA) containing 0.5% SDS/ followed by a 30 
minute wash in fresh IX SET at (Tm less 10°C) for- the oligo- . 
nucleotide probe. The membrane is then exposed to auto- 
radiographic film for detection of hybridization signals. 
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Stringent conditions means hybridization will occur only 
if there is at least 90% identity, preferably at least 95% 
identity and most preferably at least 97% identity between ' 
the. sequences. Further, it is understood that a section of 
a ,100 bps sequence that is 95 bps in length has 95% identity 
with the 1090 bps sequence from which it is obtained. See. J. 
Sambrook et al . , Molecular Cloning, A Laboratory Manual, 2d 
Ed., Cold Spring Harbor Laboratory (1989) which is hereby. 
>. incorporated by reference in its entirety. Also, it " is 
f understood that a fragment of a 100 bps sequence ' that is 95 
bps in length has 95% identity with .the 1 -100 bps sequence from 
which it is obtained. 

As used herein, a first DNA (RNA) sequence is at least 
70% and preferably at least 80% identical to another DNA 
(RNA) sequence if there is at least 70% and preferably at 
least a 80% or 90% identity , respectively , between the* bases 
-of the first sequence and the bases of the another sequence, 
' when "properly aligned with each other, for example when 
aligned by BLASTN. 

The present invention^ relates to polynucleotides, which 
differ from the reference polynucleotide such that' the 
differences are silent, for example, the amino acid sequence 
encoded by the polynucleotides, is the same. The present 
invention also' relates to nucleotide changes which result in. 
amino acid substitutions, additions, deletions, fusions and 
truncations in the polypeptide encoded by the reference 
' polynucleotide. In a preferred aspect of the invention these' 
polypeptides retain the same biological action as- the 
polypeptide encoded by. the reference polynucleotide. . 

The polynucleotides of this invention, were recovered 
from genomic gene libraries from the organisms listed in 
Table 1. Gene libraries were ' generated from either of a 
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Lambda ZAP II or a pBluscript] cloning vector (Stratagene 
Cloning Systems) . Mass excisions were performed on these 
libraries to generate libraries in the pBluescript phagemid. 
Libraries were generated and excisions were performed 
according to the protocols/methods hereinafter described. 

The pol ynucleotides of the present in ven t ion may be in 

the form of RNA or DNA which DNA includes cDNA, genomic DNA, 
and synthetic DNA. The DNA may be double- stranded or single-, 
stranded, and if single stranded may be the coding strand or 
non-coding (anti -sense) strand. The coding sequences which 
encodes the mature enzymes may- be identical to the coding 
sequences shown in Figures 1-9 (SEQ ID NOS : 19-27) or may be 
a different coding sequence which coding sequence, as a 
result of the redundancy or degeneracy of the genetic code, 
encodes the same mature enzymes as the DNA of Figures 1-9 
(SEQ ID NOS : 19-27) . 

The polynucleotide which encodes for the mature enzyme 
of Figures 1-9 (SEQ ID NOS: 28-36) may include., but is not 
limited to: only the coding sequence for the mature enzyme; 
the coding sequence for the mature enzyme and additional 
coding sequence such as a leader sequence or a proprotein 
sequence;, the coding sequence for the mature enzyme (and 
optionally additional coding sequence) and non- coding 
sequence, such as introns.or non-coding sequence 5' and/or 3' 
of the coding sequence for the mature enzyme. 

Thus, the term "polynucleotide encoding an enzyme 
(protein) " encompasses a polynucleotide which includes only 
coding sequence for the enzyme as well as a polynucleotide 
which includes additional coding and/or non-coding sequence. 

The present invention further .relates to variants of the 
■ hereinabove described polynucleotides which encode for 
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fragments, analogs and derivatives of the enzymes, having the " 
deduced amino acid sequences of Figures 1-9 (SEQ ID NOS: 28- 
. 36)'. The variant of the polynucleotide may ^be a naturally, 
occurring allelic variant of 'the polynucleotide or a- ncn-.;. 
naturally occurring variant of the polynucleotide , . 

Thus, the present invention includes polynucleotides _ 
encoding the same mature enzymes as shown in Figures 1-9 /(SEQ, 
ID NOS: 19-27) as well as variants- of such polynucleotides 
which variants encode for a fragment, derivative or analog of 
the enzymes of Figures 1-9- (SEQ ID NOS: 19-27). Such 
- nucleotide variants include deletion variants, .substitution 
variants and addition or insertion variants. 

■ As hereinabove indicated, the polynucleotides may have 
a coding sequence which is a naturally occurring "allelic 
variant of the coding sequences shown in Figures 1-9 (SEQ ID 
NOS- 19-27) . As known in the art, an allelic variant is an- 
alternate form of a polynucleotide sequence which may have a 

\ substitution, deletion or addition of one or more 
nucleotides, which does riot substantially alter the function 
of the encoded enzyme. Also, using directed and other 
evolution strategies, one may make very minor changes in DNA 

• sequence which can result., in major changes in function. 

■ Fragments Of the full length gene of the present; 
invention may. be used as hybridization probes for a cDNA or 
a genomic library to isolate the full length DNA and to 
isolate other DNAs which have a high. sequence similarity to 
the gene or similar biological activity. Probes of this type 
preferably have at least 10, preferably at least, 15, and even 

• more preferably at least 30 bases and may contain, for 
' example, at least 50 or more bases. In fact, probes of this 

• type having at least up to 150 bases^ or greater may be 
preferably utilized. The probe may also be used to .identify 
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a DNA clone corresponding co a full length, transcript and a 
generic clone or clones that contain the complete gene 
including regulatory and promotor regions, exons and introns. 
An example of a screen comprises isolating the coding region 
of the gene by using the known DNA sequence to synthesize an 
oligonucleotide probe. Labeled oligonucleotides having a 
sequence complementary o rr~~£d"e n t i c al~t~o~ that— o f ~rhe~~ge ne~^o r 
portion of the gene sequences of the present invention are 
used to screen a library of. genomic DNA to determine which 
members of the library the probe hybridizes to. 

It is also appreciated that such probes can be and are 
preferably labeled with an analytically detectable reagent to 
facilitate identification of the probe. Useful reagents 
include but are not limited to radioactivity, fluorescent 
dyes or enzymes capable of catalyzing the formation of a 
detectable product. The probes are thus useful to isolate 
complementary copies of DNA from other sources or to screen 
such sources for related sequences. 

The present invention further relates to 
polynucleotides which hybridize to the hereinabove-described 
sequences if there is at least 70%, preferably at least 90%, 
and more preferably at least 95% identity between the 
sequences. (As indicated above, 70% identity would include 
within such definition a 70 bps fragment taken from a 100 bp 
polynucleotide, for example.) The present invention 
particularly relates" to polynucleotides which- hybridize- under J 
stringent conditions to the hereinabove-described 
polynucleotides. As herein used, the term "stringent 
conditions" means hybridization will occur only if there is 
at least 95% and preferably at least 97% identity between the 
sequences. The polynucleotides which hybridize to the 
hereinabove described polynucleotides in a preferred 
embodiment encode enzymes which either retain substantially 
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the same biological function or; activity as the mature enzyme 
.encoded by the DNA of Figures 1-9 (SEQ ID NO'S": 19-27) . In 
referring to identity in the case of hybridization, .as known 
in the art, such identity refers to. the complementarity of 
two polynucleotide segments. 

Alternatively,, the polynucleotide may have at least 15 
.bases, preferably at least 3 0 bases,, and more preferably at 
least 5.0 bases which hybridize to. any part of a 
polynucleotide of the present invention and which has . an 
identity thereto, as hereinabove described, and which may or 
may not- retain activity. 'For example, such polynucleotides 
may be employed as probes for .the polynucleotides of SEQ ID 
NOS: 19-2.7, for example, for recovery of the polynucleotide 
or as a diagnostic probe or as a PCR primer. 

Thus, the present invention is directed to 
polynucleotides having at least a' 70% identity, preferably at 
least 90% identity and' more preferably, at, least a 95% 
identity to a polynucleotide which encodes the enzymes of SEQ 
ID NOS: 28-36 as well as fragments thereof, which fragments 
have at least 15 bases, .preferably at least 3 0 bases, more 
. preferably at least 50 bases and most preferably fragments 
having up to at least .150 bases or greater, which fragments 
are at least 90% identical, preferably at least 95% identical 
and most preferably at least 97% identical to any portion of 
a polynucleotide of the present invention. . 

The present invention further .relates to enzymes which 
have the deduced amino acid sequences of Figures 1-9 (SEQ ID 
NOS: 28-36) as well as fragments, analogs and derivatives of 
such enzyme . 

The terms "fragment," "derivative" and /'analog" when 
referring to the enzymes of Figures 1-9 (SEQ ID NOS . 28-36) 
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means enzymes which retain essentially the same biological 
function or activity as such enzymes. Thus, an analog 
includes a proprotein which can be activated by cleavage of 
the proprotein portion to produce an active mature enzyme. 

The enzymes of the present invention -may be a 
Hreeombi-nan-fe— enzyme^ 
preferably a recombinant enzyme. 

The fragment, derivative or analog of the enzymes of 
Figures 1-9 (SEQ ID NOS . 28-36) ' may be (i) one in which one or 
more of the amino acid residues are substituted with a 
conserved or non-conserved amino acid residue (preferably a 
conserved amino acid residue) and such substituted amino acid 
residue may or may not be one encoded by the genetic code, or 
(ii) one in -which one or more of the amino acid residues 
includes a substituent group, or (iii) one in which the 
mature enzyme is fused with another " compound, such as a 
compound to increase the half-life of the enzyme (for 
example, polyethylene glycol), or (iv) one in which the 
additional amino acids are fused to the. mature enzyme, such 
as a leader or secretory sequence or a sequence which is 
employed for purification of the mature enzyme or a 
proprotein sequence. Such fragments, derivatives and analogs 
are deemed to" be within the scope of those skilled in the art 
from the teachings herein. 

The enzymes and polynucleotides of . the present invention 
are preferably provided in an isolated form, and preferably 
are purified to homogeneity. 

The term "isolated" means that the material is removed 
from its original environment (e.g., the natural environment 
if it is naturally occurring) . For example, a naturally- 
occurring polynucleotide or enzyme present in a living animal 
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is not isolated/ but the same- polynucleotide or enzyme, 
separated from some or all of the coexisting materials: in the 
natural system, is isolated. such polynucleotides could be 
part of a vector and/or such polynucleotides cr enzymes could 
be part of a composition, and still be isolated in that such 
vector or composition is not part of its natural environment. 

The enzymes of the present invention include the enzymes- 
of SEQ ID NOS : 28-36 (in particular the mature enzyme) as 
well as enzymes which have' at least 70% .similarity 
(preferably at least 70% identity). , to the. enzymes of SEQ ID ' 
NOS: 28-36 and more preferably at least 90% similarity (more 
preferably at least 90% identity) to the enzymes of SEQ ID 
NOS: 28-36 and still more preferably at least, 95% similarity 
(still more preferably at least 95% identity) to the enzymes 
of SEQ ID NOS: 28-36 and also include portions "of such 
enzymes with such portion of the enzyme generally containing 
at least 30 amino acids and more preferably at least 50 amino 
acids and most preferably at least up to. 150 amino acids. 

As known in the art . "similarity" between two enzymes .is 
determined by comparing the amino .acid sequence and its. 
conserved amino acid substitutes of one enzyme to the 
sequence of a second enzyme. The definition of 70% 
similarity would include a 70 amino acid sequence fragment of 
a 100 amino acid sequence, for example, or a 70 ammo acid 
'sequence obtained by sequentially or randomly deleting 30 
amino acids from the 100 amino acid sequence. 

A variant , i . e.. a. " fragment » , "analog" or ; "derivative" 
polypeptide, and reference polypeptide may differ in amino 
acid sequence by one or. more substitutions, additions, 
deletions, fusions and truncations, which may be present in 
any combination. 
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Among preferred variants are those that vary from a 
reference by conservative amino acid substitutions. Such 
substitutions are those that substitute a given amino acid in 
a polypeptide by another amino acid of like characteristics. 
Typically seen as conservative substitutions are the 
replacements, one for another, among the aliphatic amino, 
-acids— Ala-,— Val-,_-Leu~and 

residues Ser and Thr, exchange of the acidic residues Asp and 
Glu, substitution between the amide residues Asn and Gin, 
exchange of the basic residues Lys and Arg and replacements 
among the aromatic residues Phe, Tyr. 

Most highly preferred are variants which retain the same 
biological function and activity as the reference polypeptide 
from which it varies,. 

Fragments or portions of the enzymes of the present 
invention may be employed for producing the corresponding 
full-length enzyme by peptide synthesis; therefore, the 
fragments may be employed as intermediates for producing the 
full-length enzymes. Fragments or portions of the 
polynucleotides of the present invention may be used to 
synthesize full-length polynucleotides of the present 
invention. 

The present invention also relates to vectors which 
include polynucleotides of the present invention, host cells 
which are. genetically engineered with vectors of the 
invention and the production of enzymes of the invention by 
recombinant techniques. 

Host cells are genetically engineered (transduced or 
transformed or transfected) with the vectors of this 
• invention which may be, for example, a cloning vector such as 
an expression vector. The vector may be, for example, in the 
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form of a plasmid, a- phage, etc. The engineered hose cells, 
can be cultured in' conventional nutrient media modified as 
appropriate for activating promoters, selecting transf ormants 
or amplifying the genes of the present invention. The 
culture conditions, such as temperature, pH and the like, are 
those previously used with the host cell selected for 
expression, and will be ' apparent to the ordinarily skilled, 
artisan. 

The polynucleotides .of the present invention may be 
employed for producing enzymes, by recombinant techniques. 
Thus, for example, the polynucleotide may be included in any 
one of a variety of expression vectors for expressing an 
enzyme. Such vectors include chromosomal , nonchromosomal and 
synthetic DNA sequences, e.g., derivatives of SV40; bacterial 
plasmids; phage DNA; baculovirus ; yeast plasmids vectors 
derived from combinations of plasmids and phage DNA, viral. 
' DNA such as vaccinia, adenovirus, fowl pox virus, and 
pseudorabies . However, any other vector may be used as, long 
as it is replicable and viable in the host. 

The appropriate DNA sequence may be inserted into the 
vector by a variety, of procedures. In general, the DNA 
sequence is inserted .into an appropriate restriction 
endonuclease site{s) by procedures known in the art. Such 
procedures and others are deemed to be within the scope of 
those skilled in the art. 

' The DNA sequence in the expression vector is. operatively 
■. • linked to an appropriate ' expression control ' sequence (s) 
(prompter) to direct mRNA synthesis. As representative 
examples of such promoters, there may be' mentioned: LTR or 
. SV4 0 promoter, the E. coli.. lac or trp, the phage lambda P L 
promoter and other promoters known to control expression of 
genes in prokaryotic or eukaryotic cells or their viruses. 
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The expression vector also contains a ribosome binding site 
for translation initiation and a transcription terminator. 
The vector may also include appropriate sequences, for 
amplifying expression. 

In addition, the expression vectors preferably contain 
one or m ore selectable mark er g ene s to' pr ovide a phenot ypic 
trait for selection of transformed host cells such as 
dihydrofolate reductase or neomycin resistance for eukaryotic 
cell culture, or such as tetracycline or ampicillin 
resistance in E.- coli. 

The vector containing the appropriate DNA sequence as 
hereinabove described, 'as well as an appropriate promoter or 
control sequence, may be employed to transform an appropriate 
host to permit the host to express the, protein. 

As representative examples of appropriate hosts, there 
may be mentioned: bacterial cells, such as E: coli, 
Streptomyces , Bacillus subtilis; fungal cells, such as yeast; 
insect cells such as Drosophila S2 and Spodoptera SfS; animal 
cells such as CHO, COS or Bowes melanoma; adenoviruses; plant 
cells, etc. The selection of an appropriate host is deemed 
to be within the scope of those -skilled in the art from the 
teachings herein. 

More particularly, the present invention also includes 
recombinant constructs comprising one or more of the 
sequences as broadly described above . The constructs 
comprise a vector, such as a plasmid or viral vector, into 
which a sequence of the invention has been inserted, in a 
forward or reverse orientation. In a preferred aspect of this 
embodiment, the construct further comprises regulatory 
sequences, including, for example, a promoter, operably 
linked to the sequence. Large numbers of suitable vectors 
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and promoters are known to .those of skill in the art, and are ' 
commercially available. The following vectors are provided 
by way of example; Bacterial: P QE70, pQE60 . pQE-9 (Qiagen) , 
pBluescript II KS, P trc99a, pKK223-3, P DR540, P RIT2T- 
(Pharmacia); Eukaryotic:'pXTl, P SG5 (Stratagene) P SVK3 , pBPV, 
-pMSG pSVL SV40 (Pharmacia)-. However, any other plasmid or, 
vector may be used as- long as they are .replicable and viable 
in the host . 

Promoter regions can be selected from any desired gene 
using CAT (chloramphenicol transferase) vectors or other , 
- vectors with selectable markers: Two appropriate vectors are 
PKK232-8 and P CM7. Particular named bacterial promoters 
delude- lad, lacZ, T3 , T7, -gpt, lambda P R , P L and trp. 
Eukaryotic- promoters include- CMV immediate early, HSV 
thymidine kinase, early. and late SV40, LTRs from retrovirus, 
' and mouse metallothionein- I . Selection of the appropriate 
' vector and promoter is well- within the level of ordinary 
skill in the art . 

in a further embodiment, the present invention relates 
to host cells containing the above -described constructs. The 
host cell can be a ' higher eukaryotic cell, such as. a 
. mammalian cell, or a lower eukaryotic cell, such as a yeast 
cell, or the host cell can be a prokaryotic cell, such as a 
bacterial cell. Introduction of the construct into the host 
cell can be effected by calcium, phosphate transf ection, DEAE- 
Dextran mediated transf ection. or electroporation (Davis, L., 
' Dibner, M., Battey, I., Basic Methods in Molecular Biology, 
: . (1986) ) . - 1 : ■ " . ',- 

• The constructs in . host cells, can be used in a 
conventional manner to produce the gene product encoded by 
the recombinant sequence.. Alternatively, the enzymes of the 
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invention can be synthetically produced by conventional 
peptide synthesizers. 

Mature proteins can be expressed in mammalian cells, 
yeast, bacteria, or other cells under the control of 
appropriate promoters. Cell -free translation systems can 
-^al-so-be-empl-oyed-to 

from the DNA constructs of the ■ present invention. 
Appropriate cloning and expression vectors for use with 
prokaryotic and eukaryotic hosts are described by Sambrook et 
ai., Molecular Cloning: A Laboratory Manual, Second Edition, 
Cold Spring Harbor, N.Y., (1989), the disclosure of which is 
hereby incorporated by reference. 

Transcription of the DNA encoding the enzymes of the 
present invention by higher eukaryotes is increased by 
inserting an enhancer sequence into the vector. Enhancers 
are cis-acting elements of DNA, usually about from 10 to 300 
bp that act on a promoter to increase its transcription. 
Examples include the SV40 enhancer on the late side of the 
replication origin bp 100 to 270, a cytomegalovirus early 
promoter enhancer, the polyoma enhancer on the late side of 
the replication origin, and adenovirus enhancers. 

Generally, recombinant expression vectors will include 
origins of replication and selectable markers permitting 
transformation of the host cell,, e.g., the ampicillin 
-resistance gene, of .£.. coli and. S. cereyisiae TRP1 gene, and 
a promoter derived from a highly- expressed gene to direct 
transcription of a downstream structural sequence. Such 
promoters can be derived from operons encoding glycolytic 
enzymes such as 3 -phosphoglycerate kinase (PGK) , a- factor, 
acid phosphatase, or heat shock proteins, among others. The 
heterologous structural sequence is assembled in appropriate 
phase with translation initiation and termination sequences, 
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and preferably/ a leader ' sequence . capable of- directing, 
secretion of translated enzyme. Optionally, the heterologous 
sequence can encode a fusion enzyme including an N-terminal 
identification peptide imparting desired characteristics; - 
e.g., stabilization or simplified purification of expressed 
recombinant product.. 

Useful expression vectors for bacterial use are 
constructed- by inserting a structural DNA sequence encoding 
a desired protein together with suitable translation 
initiation and termination signals in operable reading phase 
with a functional promoter. The vector will comprise one or 
more phenotypic selectable markers and an origin of 
replication to ensure maintenance of the vector and' to, if 
desirable, provide amplification within the host. Suitable 
prokaryotic hosts for transformation include E~. coli, 
Bacillus subtilis., Salmonella typhimurivm and various species 
within * the genera Pseudomonas, Streptomyces , arid 
Staphylococcus, although others may also be employed as a 
matter of choice. 

As a representative but nonlimiting. example, useful 
expression vectors for' bacterial use > can. . comprise a 
selectable marker and bacterial origin of replication derived 
from commercially, available plasmids comprising genetic 
elements of the well known- cloning vector pBR322 (ATCC 
37017) Such commercial vectors include, for example, 

pKK223-3 (Pharmacia Fine . Chemicals , Uppsala, Sweden) and 
pGEMl (Promega Biotec, Madison, WI , USA). These pBR322 
"backbone" sections are combined 'With an appropriate promoter 
and the structural sequence to be expressed. 

'■■(''* 

Following transformation of a suitable host strain and 
growth of the host strain to an appropriate cell .density , the 
selected promoter is induced by appropriate means (e.g., 
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temperature shift or chemical induction) and cells are 
cultured for an additional period. 

Cells are typically harvested by centrif ugat ion, 
disrupted by physical* or chemical means, and the resulting 
crude extract retained for further purification.. 

Microbial cells employed in expression of proteins can 
be disrupted by any convenient method,, including freeze- thaw 
cycling, sonication, mechanical disruption, or use of cell 
lysing agents, such methods are well known to those skilled 
in the art. 

Various mammalian cell culture systems can also be 
employed to express recombinant protein. Examples of 
mammalian expression systems include , the COS -7 Tines of 
monkey kidney fibroblasts, described by Gluzman, Cell, 23:175 
(1981) , and other cell lines capable of expressing a 
compatible vector, for example, the C127, 3T3, CHC, HeLa and 
BHK cell lines. Mammalian expression vectors will comprise 
an origin of replication, a suitable promoter and enhancer, 
and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, 
transcriptional termination sequences, and 5' flanking 
nontranscribed sequences. DNA sequences derived from the 
SV40 splice, and polyadenylation sites may be used to. provide 
the required nontranscribed genetic elements. 

■ The enzyme can be recovered and purified from 
recombinant cell cultures by methods including ammonium 
sulfate or ethanol precipitation, acid extraction, anion or 
cation exchange '* chromatography, phpsphocelluld.se 
chromatography, hydrophobic interaction chromatography, 
affinity chromatography, hydroxy! apatite chromatography and 
lectin chromatography. Protein refolding, steps can be used, 
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as necessary, ;in completing configuration of the ''mature 
protein. Finally, high performance liquid chromatography 
(HPLC) can be employed for final purificat ion steps . 

The enzymes of the, present invention may be a naturally 
purified product; or- a product of chemical, synthetio 
procedures, or produced by recombinant techniques from a 
prokaryotic or eukaryotic host (.for example, by bacterial, 
yeast, higher plant, insect, and mammalian cells in culture) . 
Depending upon the- host employed in a recombinant production 
procedure, the enzymes of the present invention may be 
glycosylated or may be non-glycosylated . Enzymes of the 
invention, may or may not also include an initial methionine 
amino acid residue. 

Phosphatases are a, group ■ of ■ key enzymes in the~'removal 
of phosphate groups from organophosphate ester compounds. ■ 
There' are ' numerous phosphatases, including alkaline 
phosphatases, phosphodiesterases and phytases . 

The general application and .definitions of;-, such 
compounds are discussed above under the background of the 
invention section. 

The present invention provides- novel phosphatase enzymes 
having enhanced thermostability.. Such phosphatases are 
beneficial- in enzyme labeling 'processes and 4 in certain 
recombinant DNA techniques, such as in the dephosphorylation 
of vector DNA prior * to insert DNA- ligation. The recombinant 
phosphatase enzymes provide the proteins p a format amenable 
to .efficient production, of- pure enzyme, which can be utilized 
in a variety of applications as described herein. 

Antibodies generated against the enzymes corresponding 
to a sequence of the present invention can be obtained by 
direct injection of the enzymes into an animal or by 
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administering Che enzymes to an animal, preferably a 

nonhuman . The antibody so obtained will then bind the 

enzymes itself. In. this manner, even a sequence encoding 

only a fragment of the enzymes can be used to generate- 

antibodies binding the whole native enzymes. Such antibodies 

can then be used to isolate the enzyme from cells expressing 

that enzyme . ; ^ r~ 

For preparation of monoclonal antibodies, . any technique 
which provides antibodies produced by continuous cell line 
cultures can be used. Examples include the hybridoma 
technique (Kohler and Milstein, Mature, 256:.495-497 , 1975} , 
the trioma technique, the human B-cell hybridoma technique 
(Kozbor et al : , Immunology Today 4:12,. 1983), and the EBV- 
hybridoma technique to produce human monoclonal antibodies 
(Cole et al., in Monoclonal Antibodies and Cancer Therapy, 
Alan R. Liss, Inc., pp. 77-96, 1985). 

Techniques described for the production of single chain $ 
antibodies (U.S. Patent 4,946,778) can be adapted to produce 
single chain antibodies to immunogenic enzyme products of 
this invention. Also, transgenic mice may be used to express \ 
humanized antibodies to immunogenic enzyme products- of this 
invention. • 

4 

Antibodies generated against an enzyme of the present 
invention may be used in screening for similar enzymes ..from 

other organisms and samples .- Such -screening -techniques., are . . 

known in the art, for example, one such screening assay is 
described in Sambrook' and Maniatis, Molecular Cloning: A 
Laboratory Manual (2d Ed.), vol. 2:Section 8.49, Cold Spring 
Harbor Laboratory, 1989, which is hereby incorporated by 
reference in its entirety. 
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The present invention will be further described with 
reference to the following examples; however, it is to be 
understood that the .present invention' is not .limited to such 
examples. All parts or amounts, unless otherwise specif ied, • 
are' by weight . 

In order to facilitate understanding of the , following 
examples certain frequently- occurring methods and/or terms 
will be described. 

"Plasmids" are designated by a lower case M p". preceded 
•and/or followed by capital letters and/or numbers. The 
starting plasmids herein are either commercially available, 
publicly available on an unrestricted oasis, or can be 
constructed from ^available plasmids in- accord with' published 
procedures. In addition, equivalent plasmids co those 
described are known in the art and. will be apparent to the 
ordinarily skilled artisan. 

"Digestion" of DNA refers to catalytic cleavage of the ■ 
DNA with a restriction enzyme, that acts only at certain- 
sequences in the- DNA. The various restriction enzymes used 
herein are .commercially available and their reaction 
conditions, co-factors and other requirements were used as 
would be known to the- ordinarily skilled artisan. For 
analytical purposes, typically 1 of plasmid or DNA 

•fragment is used with about 2 units of enzyme in about .20 pi 
of buffer solution. For the - purpose of isolating DNA 
fragments for plasmid construction, typically 5 to 50 jig of 
DNA are digested with 20 to 250 units of enzyme in a larger 
volume. Appropriate buffers and substrate amounts- for 
particular restriction enzymes are specified by the 
manufacturer. Incubation times of about 1 hour at 37 'C are 
ordinarily used, but may vary in accordance with the 
supplier's instructions. After digestion the reaction is 
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electrophoresed directly cn a polyacrylamide gel to isolate 
the desired fragment. 

Size separation of the cleaved f ragments, is perf ormed - 
using 8 percent polyacrylamide- gel described by Goedcel et 
•al. t Nucleic Acids Res. , 8 : 4057 ' ( 1980) ; 

"Oligonucleotides" refers to either a single stranded 
polydeoxynucleotide or two complementary polydeoxynucleotide 
strands which may be chemically synthesized.' Such synthetic 
oligonucleotides have no 5' phosphate : and thus will not 
* "ligate to another oligonucleotide without adding a phosphate 
with an ATP in the presence of a kinase. A synthetic 
oligonucleotide will ligate.to a fragment that has not been 
dephosphorylated . 

"Ligation" . ■ refers to the process of forming 
phosphodiester bonds between two double stranded nucleic acid 
fragments (Maniatis, T. , et a2. , Id., p. 146). Unless 
otherwise provided, ligation may be accomplished using, known 
buffers and conditions with 10 units of T4 DNA ligase 
{"ligase") per 0.5 /zg of approximately equimolar amounts of . 
the DNA fragments to be ligated. 

Unless otherwise stated, transformation -was performed as 
described in Sambrook and Maniatis, Molecular Cloning: A 
Laboratory Manual, Cold Spring Harbor- Laboratory, 1989. ■ 

One means for isolating the nucleic acid molecules 
encoding the enzymes of the present invention is to probe a 
gene library with, a natural or artificially designed probe 
-using art recognized procedures (see, for example: Current 
Protocols in Molecular Biology, Ausubel F .M , -..et . al . (EDS.) 
Green Publishing Company Assoc. and John: Wiley Interscience,, 
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New York, 1989, 1992) .. -It is appreciated to one skilled irv. 
Che art that the polynucleotides of SEQ- ID NOS:l-16, or 
fragments thereof . (comprising at . least 10 or 12 contiguous 
nucleotides), are. particularly useful probes. Other: 
particularly useful probes for this purpose are fragments 
hybridizable fragments to the sequences of SEQ ID NOS:19-27 
(i.e., comprising at least 10 or 12 contiguous nucleotides). 



It is also ap'pre ciated that such P robes can ' be ' and v ar ® 
preferably labeled with an analytically detectable reagent to 
facilitate identification of the probe. Useful reagents', 
include but are not limited to radioactivity, fluorescent 
dyes or enzymes capable of catalyzing the formation of a 
detectable product. .The probes are thus useful to isolate' 
complementary copies of DNA, from other sources or to screen 
such sources for related sequences. 

With respect to nucleic acid sequences which hybridize 
to specific nucleic acid sequences disclosed, herein, 
hybridization may be carried out under conditions of reduced, 
stringency, medium stringency or even stringent . conditions . 
As an example of oligonucleotide hybridization/ a polymer 
membrane 'containing immobilized denatured nucleic acids is 
first prehybridized for 30 minutes at 45°C in a solution 
consisting of 0.9 M NaCl , 50 mM NaH.P^pH 7.0, 5.0 mM 
Na EDTA, 0.5%'SDS, 10X Denhardt ' s , and 0.5' mg/mL 
polyriboadenylic acid. Approximately 2 X 10 1 cpm (specific 
activity 4-9 X 10 9 cpm/ug) of 5 ^P end-labeled oligonucleotide 
probe are' then added to the solution.. After 12-16 hours of 
incubation, the membrane is washed, for 30 minutes at room 
temperature in IX SET (150 mM NaCl, 20 .mM Tris hydrochloride, 
pH 7 8, 1 mM Na 2 EDTA) containing 0 . 5% SDS ,' followed by a 30 
• minute' wash in fresh IX SET at Tm -10°C f or _ the oligo- 
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nucleotide probe. The membrane is then exposed to auto- 
radiographic film for detection of hybridization signals. 

Stringent conditions means hybridization will occur only 
if there is at least 90% identity, preferably 95% identity 
and most preferably at least 97% identity between the 
sequences . See J~ Sambrook e't~^al~ Mol-ecui-ar— Cl-onrng-, — A~ 
Laboratory Manual (2d Ed. 1989) (Cold Spring Harbor 
Laboratory) which is hereby incorporated by reference in its 
entirety . 

"Identity" as the term 'is used herein, refers to a 
polynucleotide sequence which comprises a percentage .of the 
same bases as a reference polynucleotide (SEQ ID NOS:l-16). 
For example, a polynucleotide which is at least 90% identical 
to a reference polynucleotide, has polynucleotide basfcs which 
are identical in 90% of the bases which make up the reference 
polynucleotide and may have different bases in 10% of the 
bases which comprise that polynucleotide sequence. 

' The present invention relates to polynucleotides which 
differ from the reference polynucleotide, such that the 
differences are silent changes, for example, the amino acid 
sequence encoded by both polynucleotides is the same. The 
present invention also relates to nucleotide changes which 
result in amino acid substitutions, additions, deletions, 
fusions and truncations in the polypeptide encoded by the 
reference polynucleotide.- - In-. -a preferred- aspect of- the. 
invention these polypeptides retain the same biological 
action as the polypeptide encoded by the reference- 
polynucleotide. 

The polynucleotides of this invention were recovered 
from genomic gene libraries from the organisms listed in 
Table 1. Gene libraries were generated in the Lambda ZAP II 
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cloning vector (Stratagene Cloning Systems) Mass excisions 
-were performed on these libraries to generate libraries -in 
the pBluescript phagemid.. Libraries were, generated and 
excisions were performed according to the protocols/methods 
hereinafter described. - 

' The excision libraries were introduced into the E. coli 
strain BW14893 F'kanlA. Expression clones were then 
identified using a high temperature filter assay using 
phosphatase buffer containing 1 mg/ml BCIP (5-Bromo-4 -chloro- 
3-indolyl phosphate) . Expression clones encoding BCIPases 
were identified and repurified from the following organisms: 
Ammonifex degensii KC4 , . Methanococcus igneus KohS , 
Thermococcus , alcaliphilus AED112RA, Thermococcus celer, 
Thermococcus GU5L5 , 0C9a, M11TL, Thermococcus CL-2 and 
Aquifex VF-S. * . 

Expression clones were 'identified by use of a high 
temperature filter assay with either acid phosphatase buffer 
or alkaline phosphatase buffer containing BCIP". Metcalf, et 
al . , -Evidence for two phosphonate degradative pathways in - 
Enterobacter Aerogenes, J. Bacterid . , 17.4:2501-2510 .{1992.).). 

BCIPase activity was. tested as follows : An excision, 
library was introduced into the E. Coli strain BW14893 F'kan, 
a pho'pnh'lac" strain. After' growth on 100 mm. LB plates 
containing 100 ptg/ml ampicillin, 80 /xg/ml methicillin, and ImM 
IPTG, colony lifts were performed using Millipore HATF 
membrane .filters. The 4 colonies transferred to 'the filters 
were lysed with chloroform vapor in - 150 mm glass petri' 
dishes. The. filters were transferred to 100 mm glass petri 
dishes containing a piece of Whatman 3 MM 'filter paper 
saturated with either acid, phosphatase buffer (see recipe 
below) or alkaline phosphatase buffer, (see recipe below) 
containing no BCIP. . The dish was placed in the oven at 80- 
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85°C for 30-45 minutes to heat inactivate endogenous E .' coli 
phosphatases-. The filter 'bearing lysed colonies were then 
transferred to a 100 mm glass petri dish containing 3 MM paper 
saturated with, either acid phosphatase buffer or alkaline 
phosphatase buffer containing 1 mg/ml BCIP. The dish was 
•placed in the oven at 80-85°C. 



Alkaline Phosphatase Buffer (referenced in. Sambrook, J. 
et al. (1989) Molecular Cloning; A Laboratory Manual; p.- 
1874) includes 100 mM NaCl, 5 mM MgCl 2 and' 100 mM Tris-HCl (pH 
9.5) . Clones expressing phosphatase activity (when the' 
alkaline phosphatase buffer was- used) were derived from 
libraries derived from the organism identified above. 

Acid Phosphatase Buffer includes 100 mM NaCl, 5 mM MgCl 2 
and 100 mM Tris-HCL (pH 6.8). Clones expressing phosphatase 
activity (when the acid phosphatase buffer was used) were 
derived from the library derived from M11TL. 

» Positives' were observed as blue spots on the filter 
membranes. The' following filter rescue technique was used to 
retrieve, plasmid from lysed 'positive colony. 

Filter Rescue Technique: • A pasteur pipette (or glass 
capillary tube) was used to core blue spots on the filter 
membrane.- The small, filter disk was placed in an Eppendorf 
tube containing 20 ul of deionized water. The -Eppendorf tube 
-was- incubated~at-75^ 
elute plasmid DNA of f the filter. 'Plasmid DNA , containing DNA 
inserts from Thermococcus alcaliphilus AEDII12RA was used to 
transform . electrdcompetent E. coli DH10B cells. 
Electrocompetent BW14893 F'kanlA Er coli cells were; used for 
transformation of plasmid DNA containing- inserts, from 
Ammonifex degensii KC4', Methanococcus igneus KOL5, and 
Thermococcus GU5L5 .. The filter-lift assay was repeated on 
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transformation plates to identify "positives.'' The 
transformation plates were returned to 37°c' incubator to 
regenerate colonies. 3 ml of LBamp liquid was inoculated 
with repurified positives and incubated at 37°C overnight. 
Plasmid DNA was isolated from these cultures and plasmid 
insert were sequenced. ■ 1 

In some instances where. the plates used for the initial 
•colony lifts contained, non-confluent colonies, a- specif ic 
colony, corresponding to a blue spot on the filter could be 
identified on a regenerated plate and repurified directly, 
instead of using ' the filter rescue technique. This 
"repurif ication" protocol was used for plasmid DNA containing 
inserts from the following: Ammonifex degensii KC4 , 
Thermococcus celer, M11TL, and Aquifex. VF-5 . 

The filter rescue technique was used for DNA from the . 
following organisms: Ammonifex degensii KC4 , Wethanococcus 
igneus K0L5 , Thermococcus alcaliphilus AED1112RA, ' 

Thermococcus ; CL- 2, and 0C9a. 

Phosphatases are a group of key enzymes that remove 
phosphate groups from organophosphate ester compounds. The 
most important phosphatases for commercial purposes are 
alkaline phosphatases, phosphodiesterases, and phytases. 

Alkaline phosphatases have several commercial 
applications-, including their use in analytical applications 
as an enzyme label in ELISA immunoassays and enzyme-linked 
gene probes, and, their use in research applications for 
.removing 5' phosphates in polynucleotides prior, to end- 
". labeling ' and for dephosphorylating vectors prior to 'insert 
ligation (see also Current Protocols in Molecular •Biology , 
(John Wiley & Sons) (1995), chapter 3, section 10). 
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Alkaline phosphatase hydrolyzes monophosphace esters, 
releasing inorganic phosphate and the cognate alcohol 
compound. it is non-specific with respect to the alcohol 
moiety, a. feature which accounts for the many uses of this 
enzyme. The enzyme has a pH optimum between 9 and 10, 
however, it can also work at neutral pH. (From a study of 
-the— enzyme— industry— conducted— by-^Bus-i-ness— Gommuna-eat-ions-- 
Co., Inc., 25 Van Zant Street, Norwalk, CT 06855, 1995.) 

Two sources of alkaline phosphatase dominate and compete 
.in the market: animal, from bovine -. and calf intestinal 
mucosa, and bacterial, from E. coli. Due to the high 
turnover number of calf intestinal phosphatase, it is often 
selected as the label in many enzyme immunoassays. The 
usefulness of calf alkaline phosphatase is limited by its 
inherently low thermal stability, which is even - further 
compromised during the chemical preparation of enzyme: 
antibody conjugates. Bacterial alkaline phosphatase could be. 
an attractive alternative to calf alkaline phosphatase due to 
bacterial alkaline phosphatase's extreme thermotolerance at 
temperatures as high as 95°C. (Tpmazic -Allen S.J., 

Recombinant bacterial alkaline phosphatase as an 
immunodi agnostic enzyme, Annales de Biologie Clinique t 1991, 
49 (5) :287-90) . 

Antibodies generated against the enzymes corresponding 
, to a sequence of the present invention can be obtained by 
direct . injection . of the enzymes ..into an animal . or by. 
administering the enzymes to an animal, preferably 'a 
nonhuman. The antibody so obtained will then bind the 
enzymes itself. In this manner, even a sequence encoding 
only a fragment of the enzymes can be used to generate 
antibodies binding the whole native enzymes. Such antibodies 
can then be used to isolate the enzyme from cells expressing 
that enzyme. 
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' For preparation of monoclonal antibodies,, any technique 
.which' provides antibodies produced by continuous cell line, 
cultures can be , used.; , Examples include the .' hybridoma 
technique (Kohler and Milstein,- 1975, Nature, 256,: 495-497) , 
the trioma technique, the human B-cell hybridoma technique 
(Kozbor et al . , .1983, Immunology Today 4:72), and the EBV- 
' hybridoma technique - to produce human monoclonal antibodies 
(Colei et al. , 1985, in Monoclonal Antibodies and Cancer, 
Therapy, Alan R. Li'ss, Inc. , pp . 77-96) . 

techniques described for the production of single chain 
antibodies (U.S. .Patent' .4 ,946, 778) can be adapted t,q produce 
single -.chain antibodies to •immunogenic enzyme products of 
this invention. Also, transgenic mice may be used to express 
humanized antibodies "to immunogenic enzyme products of . this 
invention. 

Antibodies, as described . above , may be employed as a 
probe to- screen a library to identify the above -described 
activities- or cross -reactive activities in gene libraries 
generated from the organisms described above or 'other 
•organisms. * 



WO 97/48416 



PCT/US97/10784 



Example 1 

Bacterial Expression and Purification of Alkaline 
Phosphatase Enzymes 

DMA encoding the enzymes of the present invention, SEQ' 
ID N0S:1 through 16, were initially amplified from a 
pBluescript vector containing the DNA by the PCR technique 
"using-the-primers-not-ed-herein-, — T-he-amplif ied^sequences_„w.ere. 
then inserted into the respective pQE vector listed beneath 
the primer sequences, and the enzyme was expressed according 
to the protocols set' forth herein. The 5' and .3' 
oligonucleotide primer ' sequences used for subcloning and 
vectors for the respective genes are as follows: 
Ammonifex degensii KC4 - 3A1A 

5' CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG GGG GCA GGT CCG AAA AGG 3' 
5' CCGA GGA TCC TCA CCG CCC CCT GCG GGT CCG 3' 

Vector: pQET3 

Methanococcus igneus Kol5 - 9A1A 

5' CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG TTG GAT ATA CTG CTT GTT 3' 
5 r CCGA CGA TCC TTA TTT TTT AAC CAA ATGT TCC 3' 

Vector: pQET3 

Thermococcus Alcaliphilus AEDII12RA -ISA 

5' CCGA CAA TTG ATT AAA GAG GAG AAA TTA ACT ATG ATG ATG GAA TTC ACT CGC 3' 
5' CGGA GGA TCC CTA CAG TTC TAA AAG TCT TTT A3* ; 

Vector: pQET3 

Thermococcus Celer 25A1A (incorporating Mfel restriction 
site) 

5' CCGA CAA TTG ATT AAA GAG GAG AAA TTA ACT ATG AGA ACC CTG ACA ATA AAC 3' 
5 ' CCGA GGA TCC TTA CAC CCA CAG AAC CCT TAC 3 ' 

Vector pQET3 

Thermococcus GU5L5 - 2 6A1A 

5' CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG AAA GGA AAG TCT CTT GTT 3' 
5 ' CCGA GGA TCC TCA AGC TTC CTG GAG AAT CAA .3 ' 

Vector pQET3 
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0C9a - 27A3A , • "• . . 

V -CGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG CCA AGA AAT ATC GCC GCT 3 ' 
5 ' CCGA GGA TCC TTA ' AGG CTT CTC GAG GTG GGG GTT 3 ' _ , 

Vector pQET3 • 



Mil TL - 29A1A (incorporating M'fel restriction site) 



5- CCGA CAA TTG ATT AAA GAG GAG AAA TTA ACT ATG TAT AAA TGG ATT ATT GAG GG 3< 
5 ' CCGA GGA. CTA AAC ATA GTC TAA GTA ATT AGC 3' 

Vector pQET3 



Thermococcus .CL-2 - 30A1A 

5- CCGA GAA TTC ATT AAA GAG GAG AAA 'TTA ACT ATG AGA ATC CTC CTC ACC AAC 3' 
5' CCGA GGA TCC TCA CAG GCT CAG AAG CCT TTG 3' 

Vector DQET3 . • 



Aquifex VF-5 - 34A1A 

5- CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG GAA AAC TTA AAA AAG TAG CT 3' 
5- CCGA GGA TCC TCA CCG CCC CCT GCG GOT GCG 3 ' 

Vector pQET3 



The. restriction enzyme sices indicated correspond :o the 
restriction enzyme sites on the bacterial expression vector 
indicated for the respective gene (Qiagen, Inc. Chatsworth, 
CA) ■ The pQE vector encodes antibiotic resistance . (Amp*) . a 
bacterial origin of replication (ori) , , an iPTG-regulatable 
promoter operator (P/0) , a ribosome binding site ,(RBS) , a 6- 
His tag .and restriction enzyme sites. 

The pQE vector was digested with the restriction enzymes 
indicated. The amplified sequences . were ligated ' into the 
respective pQE vector and inserted in frame with the sequence 
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encoding for the RBS . The native stop cocion was incorporated 
so the genes were not fused to the His tag of the vector. 
The ligation mixture was then used to transform the E. coli 
strain* M15/pREP4 (Qiagen, Inc.) by. eiectroporacion . 
M15/pREP4 contains multiple copies of the plasmid pREP4 f 
which expresses the lad repressor and also confers kanamycin 

- resis t ance — (-Kan r ->— Trans-f-ormants— were— i-deiit-i-fied— by~the-ir 

ability to grow on LB plates and ampicillin/kanamycin 
resistant colonies were selected. Plasmid DNA was isolated 
and confirmed by restriction analysis. Clones containing the 
desired constructs were grown overnight (0/N) in liquid 
culture in LB media supplemented with both Amp (100 ug/ml) 
and Kan (25 ug/ml) . The O/N culture was used to inoculate a 
large culture at a ratio .of 1:100 to 1:250. The cells were 
grown to an optical density 600 (O.D. 600 ) of between 0.4 and 
0.6. IPTG ( "Isopropyl-B-D-thiogalacto pyranoside" ) was then 
added to a final concentration , of 1 mM. IPTG induces by 
inactivating the lad repressor, clearing the P/O leading to 
increased gene expression. Cells were grown an extra 3 to 4 
hours. Cell's were then harvested by centrifugation. 

The primer sequences set out above may also be employed 
to isolate the target gene from the deposited material by 
hybridization techniques described above. 
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Example 2 , 
Isolation of A Selected Clone Fro m the Deposited Genomic 

Clones ( 

■ . A clone is isolated directly by screening' the deposited 
material using the oligonucleotide primers set forth in 
Example i for the particular gene desired\. to be isolated. 
-The specific oligonucleotides are synthesized using . an 
-Applied Biosystems DNA synthesizer. 

The two oligonucleotide primers corresponding to the 
gene of interest are used to amplify the gene from the 
deposited material.. A polymerase chain reaction is carried 
out in 25 fil of reaction mixture with 0:1 ug of the DNA of 
the gene of interest. The reaction mixture is 1.5-5 mM MgCl 2 , 
0.01% (w/v) gelatin, 2 0' jiM each of dATP, dCTP", dGTP,-dTTP; 25 
pmol of each primer and 1.25 Unit of Taq polymerase. Thirty ' 
cycles' of PCR (denaturation at 94°C for 1 min; annealing at 
55°C for 1 min; elongation at 72°C for 1 min) are performed 
with the Perkin-Elmer Cetus 9600 thermal cycler. -The 
amplif ied product is analyzed by agarose gel' electrophoresis 
- and the DNA band with expected molecular weight is excised 
and purified. The PCR product is verified, to be the gene of 
interest by s.ubcloning and sequencing the DNA product. The 

■ ends of the newly purified genes are nucleotide sequenced to 
identify full length -sequences. Complete sequencing of full 
length genes is then performed by Exonuclease III digestion, 
or primer walking. 

Numerous modifications and variations of the' present 
invention are possible in light of the • above teachings:- and, 
therefore, /within the scope of the appended claims; ' the 
invention may be practiced otherwise than as. particularly 
described. 
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(A) APPLICATION NUMBER: Unassigned ' 

(B) FILING DATE: June 19, 1997 

(C) CLASSIFICATION: Unassigned 

PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

ATTORNEY /AGENT. INFORMATION; 

(A) NAME: Haile, Lisa A. 

(B) REGISTRATION NUMBER: 38,347 

(C) REFERENCE/ DOCKET NUMBER : 0 9010/015WO1 

TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 619-678-5070 

(B) TELEFAX: 619-678-5099 
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INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS 

. (A) LENGTH: S2 NUCLEOTIDES 

(B) TYPE:- NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION': SEQ ID NO: 



CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGGGGGCA GGTCCGAAAA 



• (2)- INFORMATION FOR SEQ ID NO : 2 : 

SEQUENCE CHARACTERISTICS 

(A) LENGTH : 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 
ED) TOPOLOGY: LINEAR 

MOLECULE TYPE: cDNA 

SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
TCACCGCCCC CTGCGGGTGC G • 



(ii) 
' (xi) 
CCGAGGATCC 



(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQUENCE- CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
■(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(Xi) , SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGTTGGAT ATACTGCTTG TT 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 32 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
ID) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQIDNO:4:. 

CCGAGGATCC TTATTTTTTA ACCAAATTTC CC 



52 



32 
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(2) INFORMATION FOR SEQ ID NO ; 5 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: ' NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii> MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 



CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGATGATG GAATTCACTC GC 52 



(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 32 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 6 : 



CGGAGGATCC CTACAGTTCT AAAAGTCTTT TA 3 2 



(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 
IC) STRANDEDNESS: SINGLE 
(Di TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: CDNA 

(Xi) ' SEQUENCE DESCRIPTION : SEQ ID NO : 7 : 
CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGAGAACC CTGACAATAA AC 52 



(2) INFORMATION FOR SEQ ID NO : B : 



(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

CCGAGGATCC TTACACCCAC AGAACCCTTA C 31 
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(2) INFORMATION FOR SEQ ID NO: 9: . . 

'(<) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID . . 

(C) STRANDEDNESS : , SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE : cDNA * ■ • 

. (xiV V SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

CCGAGAATTC ATTAAAGAGG -AGAAATTAAC TATGAAAGGA AAGTCTCTTG TT 

(2-} INFORMATION FOR SEQ ID NO:10: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 /NUCLEOTIDES . 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE* 
. (D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 10 : 
CCGAGGATCC TCAAGCTTCC TGGAGAATCA A 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

• (C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: ' LINEAR 

(ii) MOLECULE TYPE: cDNA 

' : (xi) SEQUENCE DESCRIPTION: SEQ ID NOill: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGCCAAGA AATATCGCCG CT 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 34 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

■ (C) STRANDEDNESS: SINGLE 
(D) TOPOLOGY: . LINEAR 

(ii) MOLECULE TYPE: CDNA ■ 
<xl) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



CGGAGGATCC TTAAGGCTTC TCGAGGTGGG GGTT 
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(2) INFORMATION FOR SEQ ID NO:13: 

£i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 



CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGTATAAA TGGATTATTG AGGG 54 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS 
.(A) LENGTH: 34 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 
< C ) STRANDEDNESS : S INGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 
CCGAGGATCC CTAAACATAG TCTAAGTAAT TAGC 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: j2 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

{ C ) STRANDEDNESS : S INGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGAGAATC CTCCTCACCA AC 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH V " 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:16: 
CCGAGGATCC TCACAGGCTC AGAAGCCTTT G 
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(2} INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS 

• (A) LENGTH: 54 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID \ 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE:. GENOMIC . DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGGAAAAC TTAAAAAAGT ACCT 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS 

(A) ' LENGTH: 31 NUCLEOTIDES 

(B) ' TYPE : ' NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(.ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 

CGGAAGATCT TCACACCGCC ACTTCCATAT A 

(2) INFORMATION FOR SEQ ID NO: 19: 

;i 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 783 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE : genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID .NO: 19: 



54 



31 



ATG AGG GGG AGC GGA GTG CGG ATA CTT CTC ACC AAC GAT GAC GGC ATC 
TTT'GCC GAG GGT CTG GGG GCT CTG CGC AAG ATG CTG GAG CCC GTG GCT 
ACC .CTT TAC GTG GTG GCT CCG GAC CGA GAG CGT AGC GCG GCC AGC CAT 
GCT ATC ACC GTT CAC CGC CCC CTG CGG GTG CGG GAG GCG GGT TTT CGC 
AGC CCC AGG CTT AAA GGC TGG GTA GTG GAC GGT ACC CCG GCC GAC TGC 
GTC AAG CTG GGC CTG GAG GTA CTT .TTG CCC GAA CGT CCA GAT TTC CTG 
GTT TCG GGC ATA AAC TAC GGG' CCC AAC CTG GGT ACC .GAC GTA CTT TAC 
TCC GGC ACC GTC TCG GCG GCC ATA GAA GGG GTA ATT AAC GGC ATT CCC 
TCG GTG GCC GTA TCT TTG GCC ACG CGG CGG GAG CCG GAC TAT ACC TGG 
GCG GCC CGG TTC GTC CTG GTC CTG CTG GAG GAA CTG CGA AAA CAC CAA 
CTG CCC CCA GGA ACC CTG CTC AAC GTC AAC GTG ' CCC GAC GGG GTG CCC. 528 
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CGC GGG GTC AAG GTG ACC AAA CTG GGA AGC GTA CGC TAC GTC AAC GTG 5 76 - 

GTA GAC TGC CGC ACC GAC CCT CGG GGG AAG GCT TAC TAC TGG ATG GCG 624 

GGA GAA CCA TTG GAG CTG GAC GGC AAC GAC TCC GAA ACC GAC GTC TGG 6 72 
GCG GTG CGA GAA GGC TAT ATT TCC GTA ACA CCG GTC CAG ATC GAC CTT * 72 0 

ACT AAC TAC GGC TTC CTG GAA GAA CTC AAA AAA TGG CGT TTC AAG GAT 768 

ATC TTT TCT TCT TAA 78 3 

(2) INFORMATION FOR SEQ ID NO;20: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 76 5 NUCLEOTIDES 
{ B ) TYPE : NUCLEIC ACID 
(C) STRANDEDNESS : SINGLE 
. (D) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE : genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 0 : 

ATG TTG GAT ATA CTG CTT GTT AAT GAT GAT GGC ATT TAT TCA AAT GGA 4 8 
TTA ATA GCT TTG AAG GAT GCA TTA TTG GAA AAA TTT AAT GCG AGG ATT 96 

ACT ATT GTA GCC CCA ACA AAT CAG CAG AGT GGT ATT GGT AGG GCA ATA 144 

AGT TTA TTC GAG CCG TTA AGG ATA ACT AAA ACC AAA TTA GCA GAT GGT 192 

TCT TGG GGA TAT GCA GTT TCA GGA ACC CCA ACA GAT TGC GTT ATA TTG 24 0 

GGC ATT TAT GAG ATA TTA AAG AAG GTA. CCT GAT GTA GTT ATA TCA GGA 2 88, 

ATA AAC ATT GGA GAA AAC CTT GGG ACT GAA ATA ACA ACT TCT GGA ACG 3 36 

TTG GGG GCT GCG TTT GAA GGG GCC CAT CAT GGG GCT AAG GCA TTA GCA 3 84' 

TCA TCA CTC CAA GTT ACC TCT GAC CAT CTA AAG TTT AAA GAG GGG GAG 4 32 

* ACC; CCA ATA GAC TTC ACA GTC CCA GCA AG A ATT ACT GCA AAT GTT GTT ' 4 80 

GAG AAG ATG TTG GAT TAT GAT TTC CCA TGT GAT GTC GTC AAC TTA AAC 52 8 

ATT CCA GAA GGA GCA ACA GAA AAG ACA CCG ATT GAA ATC ACA AGG TTG 576 

GCA ~£GG "AAA "ATG TAT" ACA"" ACA" CAC" GTT " GAG 'GAA AG A" ATA GAT "CCA~"AGA" 624 ~ 

GGG AGG AGT TAT TAT TGG ATT GAT GGG TAT CCT ATT TTA GAG GAA GAG 6 72 

GAA GAC ACT GAT GTC TAT GTT GTT AGA AGA AAG GGA CAT ATT TCT CTA 720 

ACC CCA TTA ACA TTA GAC ACA ACA ATT AAA AAT TTA GAG GAA TTT AAG 768 

AAA AAA TAT GAG AGA ATA TTA AAT GAA TGA 798 
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(21 ■ INFORMATION FOR SEQ ID NO: 21: 

(i) " SEQUENCE CHARACTERISTICS 

(A) LENGTH: 755 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE : genomic DNA 

(xi*) SEQUENCE DESCRIPTION: SEQ;IDNO:21: 



ATG 


ATG 


ATG 


GAA 


TTC 


ACT 


CGC 


GAG 


GGA 


ATA 


AAA 


GCT- 


GCT 


GTA 


GAG 


GCA 


> 48 


CTT 


CAA 


GGG 


TTA 


GGA 


GAG 


ATC 


TAC 


GTA 


GTT 


GCC 


CCA 


ATG 


TTT 


CAA 


AGG 


96 


AGC 


GCA 


AGT 


GGA 


AGG 


GCA 


ATG 


ACC 


ATC 


CAC 


AGA 


CCT 


CTA 


AGG 


GCT 


AAA 


144 


AGA 


■ATA 


AGT 


ATG 


AAC 


GGT 


GCA 


AAA 


GCA 


GCC 


TAT 


GCT 


TTG 


GAT 


GGA 


ATG 


192 




oil 


GAT 


TGC 


GTT 


ATC 


TTT 


GCC 


ATG 


GCC 


AGA 


TTT 


GGA 


GAT 


TTC 


GAC 


240 


CTT 


GCA 


ATA 


AGT 


GGT 


GTA 


AAC 


TTG 


GGA 


GAA 


AAC 


ATG 


AGC 


ACC 


GAG 


ATA 


288 


ACG 


GTT 


TCC 


GGG 


ACT 


GCA 


AGC 


GCT 


GCA 


ATA 


GAG 


GCT 


GCA 


ACC 


CAA 


GAG 


336 


ATC 


CCA 


AGC 


ATT 


CCC 


ATA 


AGC ' 


CTG 


GAA 


GTT 


AAT 


AGA 


GAA 


AAA 


CAC 


AAA 


384 


TTT 


GGT 


GAG 


GGC 


GAA 


GAG 


ATT 


GAC 


TTC 


TCA, 


GCT 


GCC 


AAG 


TAT 


T^C 


CTA 


432 


AGA 


AAA 


ATC 


GCA 


ACG 


GCG 


GTT 


TTA 


AAG 


AGA 


GGC 


CTC 


CCC 


AAA 


GGA 


GTC 


480 


GAT 


ATG 


CTG 


AAC 


GTC 


AAC 


GTC 


CCT 


TAT 


GAT 


GCA 


AAT 


GAA 


AGG 


ACA 


GAG 


528 


ATA 


GCT 


TTT 


ACT' 


CGC 


CTG 


GCA 


AGA 


AGG 


ATG 


TAT 


AGG. 


CCT 


TCT 


ATT 


GAA 


576 


GAG 


CGC 


ATA 


GAC 


CCA 


AAG 


GGG 


AAT 


CCC 


TAC 


TAC 


TGG 


ATA 


GTT 


GGA 


ACT 


624 


CAG 


TGC 


"CCT 


AAG* 


GAG 


GCA 


TTA 


GAG 


CCG 


GGA 


ACG 


GAT 


ATG 


TAT 


GTA 


GTT, 


672 


AAA 


GTT 


GAG 


AGA 


AAA 


GTT 


AGC 


GTG 


ACT 


CCA 


ATA 


AAC 


ATT 


GAT 


ATG 


► ACA 


.. 720 


GCA 


AGA 


GTG 


AAT 


TTA 


GAC 


GAG 


ATT 


AAA 


AGA 


CTT 


TTA 


GAA 


CTG 


TAG 




765 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 816 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
' (D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE : genomic DNA 



(xi) 
ATG AGA ACC 



SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CTG ACA ATA AAC ACT GAC GCG GAG GGG TTC 



GTT TTG AGG 
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ATT CTC CTG ACG AAC GAC GAT GGA ATC TAC TCC AAC GGA CTG CGC GCC 96 

GCT GTG AAA GCC CTG AGT GAG CTC GGC GAA GTT TAC GTC GTT GCC CCC 14 4 

CTC TTC CAG AGG AGC GCG AGC GGC AGG GCC ATG ACG CTC CAC AGG CCG 192 

ATA AGG GCC' AAG CGC GTT GAC GTT CCC GGC' GCA AAG ATA GCC TAC GGA 240 

ATA GAT GGA ACT CCT ACT GAC TGC GTG ATT TTC GCC ATA GCC CGC TTC 2 88 

GGG AGC TTT GGT TTA GCC GTG AGC GGG ATT AAC CTC GGC GAG AAC CTG 3 36 

AGC ACC GAG ATA ACA GTC TCA GGG ACG GCC TCC GCT GCC ATA GAG GCC' 3 84 

TCA ACT CAT GGA ATT CCG AGC ATA GCG ATT AGC CTT GAG GTG GAG TGG 4 32 

AAG AAG ACC CTC GGC GAG GGT GAG GGG GTT GAC TTC TCG GTC TCG ACT 4 80 

CAC TTC CTC AAG AG A ATC GCG GGA GCC CTC TTG GAG AG A GGT CTT CCT 528 

. GAG GGC GTT GAC ATG CTC AAC GTC AAC GTT CCG AGC GAC GCG ACG GAG 5 76 

GAA ACG GAG ATA GCA ATC ACC CGC TTA GCC CGG AAG CGC TAC TCC CCA 624 

ACG GTC GAG GAG AGG ATT GAC CCC AAG GGC AAC CCC TAC TAC TGG ATT . 6 72 

GTC GGC AAA CTT GTC CAA GAC TTC GAG CCA GGG ACA GAT GCC TAC. GCC 72 0 

CTG AAG GTC GAG AGG AAG GTC AGC GTC ACG CCG ATA AAC ATA GAT ATG 76 B 

ACT GCG AGG GTG GAC TTT GAG GAG CTT GTA AGG GTT CTG TGG GTG TAA 816 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 14 94 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DNA 

(xi) • SEQUENCE DESCRIPTION : SEQ ID NO: 23: 

ATG AAA GGA AAG TCT CTT GTT AGC GGT CTG TTG TTG GGT CTT TTA ATT 4 8 

TTG AGC CTG ATT TCA TTC CAG CCA AGC TTT GCA TAC TCC CCA CAC GGC 96 
GGT GTC AAA AAC ATC ATA ATC CTG GTT GGA GAC " GGC ATG' GGT " CTT GGG ~ 144 

CAT GTA GAA ATT ACA AAG CTC GTT TAT GGA CAC TTA AAC ATG GAA AAC 192 

TTT CCA GTT ACT GGA TTT GAG CTT ACT GAT TCC CTA AGT GGT GAA GTT 240 

ACA GAT TCT GCT GCG GCA GGA ACT GCA ATA TCC ACT GGA GCT AAA ACG' 288 

TAT AAT GGT ATG ATT TCA GTA ACC AAC ATA ACC GGA AAG ATA GTT AAC 336 

TTA ACA ACC CTA CTT GAA GTG GCT CAA GAG CTT GGG AAG TCA ACA GGG 384 

CTG GTC ACC ACA ACA AGG ATT ACC CAT GCA ACT CCA GCA GTT TTT GCG 4 32 
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TCC 


CAT 


GTC 


CCA 


GAT 


AGG 


GAT 


ATG 


GAG 


GGG 


GAG 


ATA 


. CCC 


AAG 


CAA 


CTC 


4 8 0,- 


ATA 


ATG 


CAC 


AAA 


GTT 


AAC 


GTC 


TTG 


TTG 


GGT 


GGT 


GGA 


AGG 


GAG * AAA 


TTC , ' 


f 528 v 


GAT 


GAG 


AAA 


AAT 


TTG 


GAG 


CTG 


GCC 


AAA 


AAG 


CAG 


GGA 


TAC 


AAA 


GTA 


GTT 


576 


TTC 


ACG 


AAG 


GAA 


GAG 


CTT 


GAA 


AAA 


GTT 


GAA 


GGA 


GAT 


TAT 


GTC 


CTA 


GGA. 


624 


CTC 


TTT 


GCA 


GAA 


AGT 


CAC 


ATC 


CCT 


TAC 


GTA 


TTG 


GAT 


AGA 


■AAA 


CCC 


GAT 


672 


GAT 


GTT 


GGA 


CTT TTA 


GAA 


ATG 


GCC 


AAA 


AAG 


GCA 


ATT- 


TCA 


ATA 


CTC 


GAG- 


72C 


AAG 


AAC 


CCG 


AGC 


GGA 


TTC 


TTT 


CTC 


ATG 


GTT 


GAG 


GGC 


GGA 


AGG 


ATT 


GAC 


768 


CAT 


GCA 


GCC- 


CAT 


GGA 


AAC 


GAT 


GTC 


GCA 


TCG 


GTT 


GTT 


GCA 


GAA 


ACT 


AAG 


816 


GAG 


TTT 


GAC 


GAT 


GTT 


GTC 


AGA 


TAC 


GTG 


CTG 


GAA 


TAT 


CCG 


AAG 


AAG 


AGG 


864 


GGA 


GAT 


ACC 


TTG 


GTA 


ATA 


GTG 


CTT 


GCC 


GAT 


CAC 


GAA 


ACT 


GGA. 


GGT 


CTT ' 


912 


GCA 


ATA 


GGT 


CTA 


ACG 


TAT 


GGA 


AAT 


GCA 


ATC . 


GAT 


GAA 


GAT 


GCC 


ATA 


AGA 


960 


AAA 


ATA 


AAA 






ACG 


TTG 


AGG 


ATG 


CCC 


AAA 


GAG 


GTT 


AAG 


GCA GGG 


1008 


AGT 


AGT 


GTA 


AAA 


UAO 


TCC 


TCA" AAG GTA 


TGC 


CGG 


ATT 


TGT 


CCC 


AAC 


AGA , 


10 56 


GGA 


AGA 


AGT 


CAG 


TAT 


ATT 


GAG 


AAT 


GCG 


CTG 


CAC 


TCG 


ACA AAC 


AAG 


TAT 


1104 ' 


r^c 

uuL 


PTP 
1 v_ 


TCA 


AAT 


GCA 


GTA 


GCC 


GAT. GTT 


ATA. 


AAC 


AGG 


CGT 


ATT 


GGT 


GTT 


1152 




i 1 V— 


ACC 


TCC 


TAT 


GAG 


CAT 


ACA 


GGA 


GTT 


CCA 


GTT 


CCG 


CTC. 


TTA 


GCT 


1200- 


Tap 


GGT 


CCC 


GGG 


GCA 


GAG 


AAC 


TTC 


AGA 


GGT 


TTC 


TTA 


CAC 


CAT 


GTG 


GAT 


1248 


ALA 




ana 


TTA 


GTT 


GCA 


AAG 


TTA 


ATG 


CTC 


TTT 


GGA 


AGG 


AGG 


AAT 


ATT 


12 96 ' 


CCA 


GTT 


ACv- 


ATT 


TCA 


AGC 


GTG 


AGC 


AGT 


GTT 


AAG 


GGA 


GAC 


ATA 


ACC 


GGT 


1344 


GAT 


TAC 


AGG 


GTT 


GAT 


GAG 


AAG 


GAT 


GCC 


TAC, 


GTT 


ACG 


CTC 


ATG 


ATG 


TTT 


1392 


CTC 


GGA 


GAA 


AAA 


GTG 


GAT 


AAT 


GAA 


ATT 


GAA 


AAG 


AGA 


GTC 


GAT 


ATA 


GAC 


. 1440 


AAC 


AAC 


GGC 


ATG 


GTT 


GAC 


TTA 


AAT 


GAC 


GTC 


ATG 


TTG 


ATT 


CTC 


CAG 


GAA* 


1488 


GCT 


TGA 






























,14 94 



(2) INFORMATION FOR SEQ ID . NO: 24: * 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 1755- NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS :} SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) , MOLECULE TYPE: , genomic DNA 



(xi) 



SEQUENCE DESCRIPTION: SEQ \ ID NO:24: t 



ATG CCA AGA AAT ATC GCC GCT GTA TGC GCC CTG GCC GCT TTG TTA GGG 4 8 

TCG GCC TGG GCG GCC AAA GTT GCC GTC TAC CCC TAC GAC GGA GCC GCT 96 

TTG CTG GCG GGG CAG CGC TTC GAT TTG CGC ATA GAA GCC TCC GAG CTG 144 

-57- 



WO 97/48416 PCT/US97/10784 



AAA 


GGC 


AAT 


TTA 


AAG 


GCT 


TAC 


CGC 


ATC 


ACC 


CTG 


GAC 


GGC 


CAG 


CCT 


CTG 


192 - 


. GCG 


GGC 


CTC 


GAG 


CAA 


ACC 


GCG 


CAG 


GGG 


GCC 


GGG 


CAG 


GCC 


GAG 


TGG 


ACC 


240 


CTG 


CGC 


GGT 


GCC 


TTC 


CTG CGC 


CCT 


GGA 


AGC 


CAC 


ACC 


CTC 


GAG 


GTC 


AGC 


288 


CTC 


ACC 


GAC 


GAC 


GCT 


GGG 


GAG 


AGC 


AGG 


AAG 


AGC 


GTA 


CGT 


TGG 


GAG 


GCT 


336 


CGG 


CAG 


AAC 


CTT 


CGC 


TTG 


CCC 


CGA 


GCG 


GCC 


AAG 


AAT 


GTG' ATT 


CTC 


TTC " 


384 


ATT 


GGC 


GAC 


GGG 


ATG 


GGC 


TGG 


AAC 


ACC 


CTC 


AAC 


GCC 


GCC 


CGC 


ATC 


ATC 


432 


GCC 


AAA 


GGC 


TTT 


AAC 


CCC 


GAA 


AAC 


GGT 


ATG 


CCC 


AAC 


GGA 


AAC 


CTC 


GAG 


480 


ATC 


GAG 


AGT 


GGT 


TAC 


GGT 


GGG 


ATG 


GCT 


ACC 


GTC 


ACT 


ACC 


GGC 


AGC 


TTT 


528 


GAT 


AGC 


TTC 


ATC 


GCC 


GAC 


TCA 


GCT 


AAC 


TCG 


GCT 


TCT 


TCC 


ATC 


ATG 


ACC 


576 


GGG 


CAG 


AAG 


GTG 


CAG 


GTG 


AAT 


GCC 


CTC 


AAC 


GTT 


TAC 


CCA 


TCA 


AAC 


CTC 


624 


AAA 


GAT 


ACC 


CTG 


GCC 


TAC 


CCC 


CGG 


ATC 


GAA 


ACC 


CTA 


GCG 


GAG 


ATG 


CTC 


672 


AAG 


CGG 


GTA 


CGC 


GGG 


GCC 


AGC 


ATT 


GGG 


GTA 


GTG 


ACC 


ACC 


ACC 


TTC 


GGC 


720 


ACC 


GAC 


GCT 


ACC 


CCG 


GCT 


TCA 


CTC 


AAC 


GCC 


CAT 


ACC 


CGC 


CGC 


CGC 


GGT 


768 


GAT 


TAC 


CAG 


GCT 


ATC 


GCC 


GAC 


ATG 


TAC 


TTT 


GGT 


AGA 


GGC 


GGG 


TTC 


GGT 


816 


GTT 


CCC 


TTG 


GAT 


GTG 


ATG 


CTC 


TTC 


GGT 


GGT 


TCA 


CGC 


GAC 


TTC 


ATC 


CCC 


864 


CAG 


AGC 


ACC 


CCT 


GGC 


TCG 


CGG 


CGC 


AAG 


GAT 


AGC 


ACG 


GAC 


TGG 


ATT 


GCC 


912 rv 


GAA 


TCC 


CAG 


AAG 


CTG 


GGC 


TAC 


ACC 


TTT 


GTC 


AGC 


ACC 


CGC 


AGC 


GAG 


CTG 


960 -\ 


CTG 


GCG 


GCC 


AAA 


CCC 


ACC 


GAT 


AAG 


CTG 


TTT 


GGG 


CTG 


TTC 


AAC 


ATT 


GAC 


1008 4j 


AAC 


TTC 


CCC 


AGC 


TAC 


CTA 


GAC 


CGC 


GCA 


GTG 


TGG 


AAG 


CGG 


CCC 


GAG 


ATG 


1056 V 


CTG 


GGA 


AGC 


TTT 


ACC 


GAT 


ATG 


CCC 


TAC 


CTC 


TGG 


GAG ATG 


ACC 


CAG 


AAA 


1104 , 


GCC 


GTG 


GAG 


GCT 


CTC 


TCC 


AGA 


AAC 


GAC 


AAA 


GGC 


TTT 


TTC 


TTG 


ATG 


GTT 


1152 f; ; 


GAG 


GGG 


GGA 


ATG 


GTG 


GAT 


AAG 


TAC 


GAG 


CAC 


CCC 


TTG 


GAC 


TGG 


CCC 


CGC 


1200 ,:, 


GCA 


CTT 


TGG 


GAT 


GTA 


CTC 


GAG 


CTG 


GAC 


cqc 


GCG 


GTG 


GCT 


TGG 


GCC 


AAG 


1248 , 


GGC 


TAT 


GCG 


GCC 


TCC 


CAC 


CCC 


GAT 


ACC 


CTG 


GTG 


ATT 


GTC 


ACC 


GCC 


GAC 


1296 


CAC 


GCT 


CAC 


TCG 


ATC 




GTG 


TTT 


GGC 


GGT 


TAC 


GAC 


TAC 


TCC 


AAG 


CAG 


1344 


GGC 


CGG 


GAG 


GGG 


GTG 


GGG 


GTT 


TAT 


GAG 


GCC 


GCC 


AAG 


TTC 


"CCC 


ACC 


TAC 


1392 


GGC 


GAC 


AAA 


AAA 


GAC 


GCC 


AAC 


GGC 


TTT 


CCC 


TTG 


CCC 


GAC 


ACC 


ACT 


CGG 


1440 


GGA 


ATC 


GCG 


GTA 


GGC 


TTC 


GGG GCC 


ACQ 




LaX i 


TAC 


TGT 


GAA 


ACC 


TAC 


14 88 


CGG 


GGC 


CGC 


GAG 


GTC 


TAC 


AAA 


GAC 


CCC 


ACC 


ATC 


TCC 


GAC 


GGC 


AAA 


GGT 


1536 


GGT 


TAC 


GTG 


GCC 


AAC 


CCT 


GAG 


GTC 


TGC 


AAG 


GAG 


CCG 


GGC 


CTT 


CCA 


ACG 


1584 


• TAC 


1 CGG 


i CAA 


. CTC 


' CCA 


. GTA 


k GAT 


1 AGC 


' GCC 


CAG 


GGC 


GTG 


CAC 


ACG 


GCT 


GAT 


1632 


CCC 


: ATG 


\ CCG 


! CTG 


; TTT 


' GCC 


: TTT 


' GGC 


: GTG 


\ GGG 


■ TCT 


CAG 


TTC 


TTC 


AAT 


GGC 


1680 



-58- 



if * 



WO 97/48416 PCTYUS97/10784 

CTC ATC GAC CAG ACC GAG ATC TTC TTC CGC ATG GCC CAG GCC CTA .GGG 1728- 
TTC AAC CCC CAC CTC GAG AAG. CCT TAA ,. 1755 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH : 912 NUCLEOTIDES i 

(3) TYPE: NUCLEIC ACID , 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DNA 

(Xi) SEQUENCE DESCRIPTION: " SEQ ID NO: 25: 

ATG TAT AAA TGG ATT ATT GAG GGT AAG CTT GCC CAA GCA CCT TTT CCA 4 8 

, • AGC CTA GGT GAA CTA GCC GAT CTC AAA AGA CTT TTC GAC GCC ATT ATT 96 

GTT. CTT ACA ATG CCG CAT GAA CAA CCG CTT AAT GAG AAA TAT ATC GAG 14 4 

ATA TTA GAG AGC CAT' GGA TTC CAA GTC CTC CAT GTC CCC ACG CTC GAC 192 

TTT CAT' CCT TTA GAA CTC TTC GAC CTT TTG AAA ACA AGC ATA TTC. ATT , 24 0 

GAT GAA AAC CTG GAG AGA TCC CAC AGA* GTG CTT GTC CAC TGC ATG GGA .28 8 

GGC ATA GGC CGG AGC GGG CTT GTA ACT GCT GCG TAC TTA ATA TTC AAA 3 36 

GGT TAT GAT ATT, TAC GAC GCG GTA AAG CAT GTG AGA ACG GTA GTG CCT 384 

GGT GCT ATT GAA AAC AGA GGG CAA GCG TTA ATG CTT GAG AAC TAC TAT 432 

ACC CTG GTC AAA. AGT TTC AAC AGA GAG TTG CTG AGA GAC TAC GGG * AAG * 480 

AAA ATT TTC ACG CTC GGT GAC CCG AAG GCG GTT CTC CAC GCT TCT AAG ' 528 

ACG ACT CAG TTC ACG ATT GAA CTC TTA AGC AAC TTA CAC GTC AAC GAG 576 

GCG TTT TCA ATC AGT GCG ATG GCT CAA TCA CTG CTC CAC TTT CAC GAC 624 

GTA AAA GTC CGC TCT AAA CTG AAA GAA GTA TTC, GAA AAC ATG GAA TTC 6 72 

TCA TCC GCC TCA GAG GAG GTT CTG TCA TTT ATT CAC CTA CTC GAT TTC 720 

TAT CAG GAT GGC AGG GTT GTT TTA ACC ATT TAC GAT TAT CTC CCC GAT 76 8 

AGG GTG GAT TTG ATT TTA TTG TGT AAG TGG GGT TGT GAT AAA ATA GTT 816 

GAA GTC TCG TCT TCA GCG AAG AAA ACC GTT GAG AAG CTT GTA GGA AGA 864 

AAG GTT TCC CTA TCC TGG GCT AAT TAC TTA GAC TAT GTT TAG 912 

(2) INFORMATION FOR SEQ ID NO: 26: 

, (i) SEQUENCE CHARACTERISTICS' 

(A)" LENGTH: 774 NUCLEOTIDES 
<BJ TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: genomic DNA 

(xi) SEQUENCE DESCRIPTION: . SEQ ID. NO: 26: 

ATG AGA ATC CTC CTC ACC AAC GAC GAC GGC ATC TAT TCC AAC GGT CTG 43 

CGC GCG GCG GTG AAG GGC CTG AGC GAG CTC GGC GAG GTC TAC GTC GTC 96 

GCC CCG CTC TTC CAG AGG AGC GCG AGC GGT CGG GCG ATG ACC CTA CAC 14 4, 

AGG CCG ATA AGG GCA AAG AGG GTT GAC GTT CCC GGC GCG AAG ATA GCG 192 



I A I 




ATA 




GGA 


ACG 


CCG 


ACC 


GAC 


TGC 


GTG 


ATT 


TTT 


GCC 


ATC 


GCC 


240 


CGC 


TTC 




nap 


TTT 


GAT 


CTG 


GCG 


GTC 


AGC 


GGG 


ATA 


AAC 


CTA 


GGC 


GAG 


283 


AAC 


CTG 


AGC 


ACG 


GAG 


ATA 


ACC 


GTC 


ILL 




Ann 






Cirri 




ATA 


3 36 


GAG 


GCT 


TCC 


ACC 


CAC 


GGG 


ATT 


CCA 


AGT 


GTA 


GCT 


ATA 


AGC 


CTC 


.GAG 


GTC 


384 


GAG 


TGG 


AAG 


AAG 


ACC 


CTC 


GGC 


GAG 


GGG 


GAG 


GGT 


ATT 


GAC 


TTC 


TCG 


GTT 


432 


TCA 


GCA 


CAC 


TTC 


CTG 


AGA 


AGG 


ATA 


GCG 


ACG 


GCT 


GTC 


CTT 


AAG 


AAG 


GGC 


490 


CTG 


CCT 


GAA 


GGG 


GTG 


GAC 


ATG 


CTC 


AAC 


GTG 


AAC 


GTC 


CCT 


AGC 


GAC 


GCC 


528 


AGC 


GAG 


GGG 


ACT 


GAG 


ATC 


GCC 


ATA 


ACG 


CGC 


CTC 


GCG 


AGG 


AAG 


CGC 


TAT ■ 


576 


TCT 


CCG 


ACG 


ATA 


GAG 


GAG 


AGG 


ATA 


GAC 


CCC 


AAG 


GGC 


AAC 


CCC 


TAC 


TAC 


624 


TGG 


ATC 


GTT 


GGC 


AGG 


CTC 


GTC 


CAG 


GAG 


TTC 


GAG 


CCG 


GGC 


ACG 


GAC 


GCC 


672 


TAC 


GCT 


CTG 


AAA 


GTC 


GAG 


AGA 


AAG 


GTC 


AGC 


GTC 


ACG 


CCC 


ATA 


AAC 


ATC 


720 


GAC 


ATG 


ACT 


GCG 


AGG 


GTT 


GAC 


TTT 


GAG 


AAC 


CTT 


CAA 


AGG 


CTT 


CTG 


AGC 


763 


CTG 


TGA 






























774 



(2) INFORMATION FOR SEQ ID NO: 27: 

<i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 7 95 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DNA 

..'.(xil SEQUENCE DESCRIPTION: SEQ ID NO:27: 

ATG GAA AAC TTA AAA AAG TAC CTA GAA GTT GCA AAA ATA GCC GCG CTC 48 

GCG GGT GGG CAG GTT CTG AAA GAA AAC TTC GGA AAG GTA AAA AAG GAA 96 

AAC ATA GAG GAA AAA GGG GAA AAG GAC TTT GTA AGT TAC GTG GAT AAA 144 

ACT TCA GAG GAA AGG ATA AAG GAG GTG ATA CTC AAG TTC TTT CCC GAT ' 192 

CAC GAG GTC GTA GGG GAA GAG ATG GGT GCG GAG GGA AGC GGA AGC GAA 240 

TAC AGG TGG TTC ATA GAC CCC CTT GAC GGC ACA AAG AAC TAC ATA AAC 2 88 
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GGT 


TTT 


CCC 


ATC 


TTT 


GCC 


GTA 


TCA 


GTG 


GGA 


CTT 


GTT 


AAG. GGA 


GAA 


GAG 


336 


CCA 


ATT 


GTG 


GGT 


GCG 


GTT 


TAC 


CTT 


CCT 


TAC 


TTT 


GAC 


AAG 


CTT 


TAC. 


TGG 


384 


GGT 


GCT 


AAA 


GGT 


CTC 


GGG 


GCT 


TAC 


GTA 


AAC 


GGA 


AAG 


AGG 


ATA 


AAG 


GTA 


432 


AAG 


GAC 


AAT 


GAG 


AGT 


TTA 


AAG 


CAC 


GCC 


GGA 


GTG 


GTT 


TAC 


GGA 


TTT 


CCC 


480 


TCT 


AGG 


AGC 


AGG 


AGG 


GAC 


ATA 


TCT 


ATC 


TAC 


TTG 


AAC 


ATA 


TTC 


AAG 


GAT 


528 


GTC 


TTT 


TAC 


GAA 


GTT 


GGC 


TCT 


ATG 


AGG 


AGA 


CCC 


GGG 


GCT 


GCT 


GCG 


GTT 


576 


GAC 


CTC 


TGC 


ATG 


(j I O 


crc 

ulu 






ATA 




GAC 


GGG 


ATG 


ATG 


GAG 


TTT 


624 


GAA 


ATG 


AAG 


CCG 


TGG 


GAC 


ATA 


ACC 


GCA 


GGG 


CTT 


GTA 


ATA 


CTG 


AAG 


GAA 


672 


GCC 


GGG. 


GGC 


GTT 


TAC 


ACA 


CTT 


GTG 


GGA 


GAA 


CCC 


TTC 


GGA 


GTT 


TCG 


GAC 


720 


ATA 


ATT 


GCG 


GGC 


AAC. 


AAA 


GCC 


CTC 


CAC 


GAC 


TTT 


ATA 


CTT 


CAG 


GTA 


GCC 


768. 


AAA 


AAG 


TAT 


ATG 


GAA 


GTG 


GCG 


GTG 


TGA 
















795 



'(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH ': 26 0 AMINO ACIDS 
{ B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR. 

Ui) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Arq Gly Ser Gly Val Arg lie Leu Leu Thr Asn Asp Asp Gly fie 
5 10 15 

Phe Ala Glu Gly Leu Gly Ala Leu Arg Lys Met Leu Glu Pro Val Ala 
20 25 30 

Thr Leu Tyr Val Val Ala Pro Asp Arg Glu Arg Ser Ala Ala Ser His 
35 40 45 

Ala lie Thr Val His Arg Pro . Leu Arg Val Arg Glu Ala Gly Phe Arg 
50 55 60 

Ser Pro Arg Leu Lys Gly Trp'Val Val Asp Gly Thr Pro Ala Asp Cys 
65 70 75 -80 

Val Lys Leu Gly Leu Glu Val Leu Leu, Pro Glu Arg Pro Asp Phe Leu 
85 90 95 

Val Ser Gly He Asn Tyr Gly Pro Asn Leu Gly Thr Asp Val Leu Tyr 

100 ■ 105 . ■ -HO 

Ser Gly Thr Val' Ser Ala Ala He Glu Gly Val' He Asn- Gly He Pro 
115 120 125. 

Ser Val Ala Val Ser Leu Ala Thr Arg Arg Glu Pro Asp Tyr Thr Trp 
130 135' .140 

Ala Ala Arg Phe Val Leu Val Leu Leu Glu Glu Leu Arg Lys His Gin 
145 150 155 160 
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Leu Pro Pro' Gly Thr Leu Leu Asn Val Asn Val Pro Asp Gly Val Pro 
165 170 175 

Arg Gly Val Lys Val Thr Lys Leu Gly Ser Val Arg Tyr Val Asn Vai 
180 . 185 190 

Val Asp Cys Arg Thr Asp Pro Arg Gly Lys Ala Tyr Tyr Trp Met Ala 
195 200 205 

Gly Glu Pro Leu Glu Leu Asp Gly Asn Asp Ser Glu. Thr Asp Val Trp 
210 215 220 



Ala Val Arg Glu Gly Tyr He Ser Val Thr Pro Val Gin He Asp Leu 
225 230 235 240 

Thr Asn Tyr Gly Phe Leu Glu Glu Leu Lys Lys Trp Arg Phe Lys Asp 
245 250 255 

He Phe Ser Ser 
260 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH.: 265 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

{Xi> SEQUENCE' DESCRIPTION: SEQ ID NO : 2 9 : 

Met Leu Asp He Leu Leu Val Asn Asp Asp Gly lie Tyr Ser Asn Gly 
5 10 15 

Leu He Ala Leu Lys Asp Ala Leu Leu Glu Lys Phe Asn Ala Arg lie 
20 25 30 

Thr He Val Ala Pro Thr Asn Gin Gin Ser Gly He Gly Arg Ala He 
35 40 45 

Ser Leu Phe Glu Pro Leu Arg He Thr Lys Thr Lys Leu Ala Asp Gly 
50 55 60 

Ser Trp Gly Tyr Ala Val Ser Gly Thr Pro Thr Asp Cys Val He Leu 
65 70 75 80 

Gly' He -Tyr Glu lie Leu Lys Lys- Val Pro Asp.. Val. Val, lie Ser Gly_ 
85 90 95 

He Asn He Gly Glu Asn Leu Gly Thr Glu He Thr Thr Ser Gly Thr 
100 105 110 

Leu Gly Ala Ala Phe Glu Gly Ala His His Gly Ala Lys Ala Leu Ala 
115 120 125 

Ser Ser Leu Gin Val Thr Ser Asp His Leu Lys Phe Lys Glu Gly. Glu 
130 135 140 

Thr Pro He Asp Phe Thr Val Pro Ala Arg He Thr Ala Asn Val Val 
145 150 155 160 
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Glu Lys Met Leu Asp T<yr Asp Phe Pro Gys Asp Val Val. Asn- Leu Asn 
165 170 ■ 175 

lie Pro Glu Gly Ala Thr Glu Lvs Thr Pro lie Glu lie Thr Arg Leu 
180 185 190 

Ala Arg Lys Met Tyr Thr Thr His Val. Glu Glu Arg He Asp Pro Arg 
195 200 205 

Gly Arg Ser. Tyr /Tyr Trp He Asp Gly Tyr Pro lie Leu Glu Glu Glu 
210 215 , 220 

Glu Asp Thr Asp Val Tyr Val Val Arg Arg Lys Gly His He Ser Leu. 
225 230 235 240 

^hr Pro Leu Thr Leu Asp Thr Thr He Lys Asn Leu Glu Glu Phe Lys 
245 250 - 255 

Lys Lvs Tyr Glu Arg He Leu Asn Glu 
. ' " 260. 265 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) • SEQUENCE CHARACTERISTICS 

(A) LENGTH: 254 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ-- ID NO: 3.0 : 

Met Met Met Glu Phe Thr Arg Glu. Gly He Lys Ala Ala Val Glu Ala 
5 10 15 

Leu Gin Gly Leu Gly Glu He Tyr Val Val Ala Pro Met Phe Gin Arg 
20 25 30 

Ser Ala Ser Gly Arg Ala Met Thr He His Arg Pro Leu Arg Ala Lys 
35 40 45 

Arg He Ser. Met Asn Gly Ala Lys Ala Ala Tyr Ala Leu Asp Gly Met 
50 55 60 

Pro' Val Asp Cys Val He Phe Ala, Met Ala. Arg Phe Gly Asp Phe Asp 
65 70 75 80 

Leu Ala He Ser Gly Val Asn Leu Gly' Glu Asn Met Ser Thr Glu lie 
B5 . 90 95 

Thr Val Ser Gly Thr Ala Ser Ala Ala He Glu Ala Ala Thr Gin Glu 
100 '105 HO 

lie Pro Ser He Pro He Ser Leu Glu Val Asn Arg Glu Lys His Lys 
115 120 125 

Phe Gly Glu Gly Glu Glu He Asp Phe Ser Ala Ala Lys Tyr Phe Leu 
130 135 140 

Arg Lys He Ala Thr Ala Val Leu Lys Arg Gly Leu Pro Lys Gly Val 
145 150 155 160 
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Asp Met Leu Asn Val Asn Val Pro Tyr Asp Ala Asn Glu Arg Thr Glu 
165 170 175 

lie Ala Phe Thr Arg Leu Ala Arg Arg Met Tyr Arg Pro Ser lie Glu 
180 185 190 

Glu Arg lie Asp Pro Lys Gly Asn Pro Tyr Tyr Trp lie Val Gly Thr 
195 200 205 

Gin Cys Pro Lys Glu Ala Leu Glu Pro Gly Thr Asp Met Tyr Val Val 
210 215 220 



Lys Val Glu Arg Lys' Val Ser Val Thr Pro He Asn lie Asp Met Thr 
225 230 235 240 

Ala Arg Val Asn Leu Asp Glu He Lys Arg Leu Leu Glu Leu 
245 250 

(2) INFORMATION FOR SEQ ID NO: 31: 

(ij SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 71 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 31: 

Met Arg Thr Leu Thr He Asn Thr Asp Ala Glu Gly Phe Val Leu Arg 
5 10 15 

lie Leu Leu Thr Asn Asp Asp Gly lie Tyr Ser Asn Gly Leu Arg Ala 
20 25 30 

Ala Val Lys Ala Leu Ser Glu Leu Gly Glu Val Tyr Val Val Ala Pro 
35 40 45 

Leu Phe Gin Arg Ser Ala Ser Gly Arg Ala Met Thr Leu His Arg Pro 
50 55 60 

He Arg Ala Lys Arg Val Asp Val Pro Gly Ala Lys He Ala Tyr Gly 
65 70 75 80 

He Asp Gly Thr Pro Thr Asp Cys Val He Phe Ala He Ala Arg Phe 
85 90 95 

Gly- Ser Phe Gly Leu Ala Val Ser Gly He Asn Leu Gly Glu Asn Leu 

100 105 ------ 11Q 

Ser Thr Glu He Thr Val Ser Gly Thr Ala Ser Ala Ala lie Glu Ala 
115 120 125 

Ser Thr His Gly He Pro Ser lie Ala He Ser Leu 'Glu Val Glu Trp 
130 135 140 

Lys Lys Thr Leu Gly Glu Gly Glu Gly Val Asp Phe Ser Val Ser Thr 
145 / 150 155 # 160 

His Phe Leu Lys Arg He Ala Gly Ala Leu Leu Glu Arg Gly Leu Pro 
165 170. 175 
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Glu Gly Val Asp Met Leu Asn Val Asn Val Pro Ser Asp Ala Thr Glu 

180 ie ' s 190 

G T u Thr Glu lie Ala He Thr Arg Leu Ala Arg Lys Arg Tyr Ser Pro 

195 , 200 205 

Thr Val Glu Glu Arg He Asp Pro Lys Gly Asn Pro Tyr Tyr Trp He 

210 , 215 220 . , ' 

Val Gly Lys Leu Val Gin Asp Phe Glu Pro .Gly Thr Asp Ala Tyr Ala 

225 • ■ 230 235' \ 240 

Leu Lys Val Glu Arg Lys Val Ser Val Thr Pro lie Asn He Asp Met 

. 245 250 255 

Thr Ala Arg Val Asp Phe Glu Glu Leu VaL Arg Val Leu Trp Val 

260 265 2 70 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 497 AMINO ACIDS 

(B) TYPE; AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 32: 

Met Lys Gly Lys Ser Leu Val Ser Gly Leu Leu Leu Gly; Leu Leu lie 
5 10 15 

Leu Ser Leu lie Ser Phe Gin Pro Ser Phe Ala Tyr Ser Pro His Gly 
20 . 25 30 

Gly Val Lys Asn He lie He Leu Val Gly Asp Gly Met, Gly Leu Gly ■ 
35 40 ,45 

His Val Glu He Thr Lys Leu Val Tyr Gly His, Leu Asn Met Glu Asn , 

50 * 55 ' 60 - 

Phe Pro Val Thr Gly Phe Glu Leu Thr Asp Ser Leu Ser Gly Glu Val 
65 70 75 ' . 80 

Thr Asp Ser Ala Ala Ala Gly Thr Ala He Ser Thr Gly Ala Lys Thr 
85 90 . 95 

Tvr Asn Gly Met He Ser Val Thr Asn He Thr Gly Lys He Val Asn 
7 100 • 105 HO 

Leu Thr Thr Leu Leu Glu Val Ala Gin Glu Leu Gly Lys Ser Thr Gly 
115 120 ' 125 

Leu Val Thr Thr Thr Arg lie Thr His Ala Thr Pro Ala Val Phe Ala. 
130 135 140 

Ser His Val Pro Asp Arg Asp Met Glu Gly Glu He Pro Lys Gin Leu 
145 150 155 160 
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He Met His Lys Val Asn Val Leu Leu Gly Gly Gly Arg Glu Lys Phe 
165 170 175 

*Asp Glu Lys Asn Leu Glu Leu Ala Lys Lys Gin Gly Tyr Lys. Val Val 
180 18 5 190 

Phe Thr Lys Glu Glu Leu Glu Lys Val Glu Gly Asp Tyr Val Leu Gly 
195 200 205 

Leu Phe Ala Glu Ser His He Pro Tyr Val Leu Asp Arg Lys Pro Asp 
210 215 220 



Asp Val Gly Leu Leu Glu Met Ala Lys Lys Ala He Ser He Leu Glu 
225 230 235 240 

Lys Asn Pro Ser Gly Phe Phe Leu Met Val Glu Gly Gly Arg He Asp 
245 250 255 

His Ala Ala His Gly Asn Asp Val Ala Ser Val Val Ala Glu Thr Lys 
260 265 270 

Glu Phe Asp Asp Val Val Arg Tyr Val Leu Glu Tyr Pro Lys Lys Arg 
275 280 285 

Gly Asp Thr Leu Val He Val Leu Ala Asp His Glu Thr Gly Gly Leu 
290 295 300 

Ala lie Gly Leu Thr Tyr *Gly Asn Ala He Asp Glu Asp Ala He Arg 
305 310 315 320 

Lys lie Lys Ala Ser Thr Leu Arg Met Pro Lys Glu Val Lys Ala Gly 
325 330 t .335 

Ser Ser Val Lys Glu Ser Ser Lys Val Cys Arg He Cys Pro Asn Arg 
340 345 350 

Gly Arg Ser Gin Tyr He Glu Asn Ala Leu His Ser Thr Asn Lys Tyr 
355 360 365 

Ala Leu Ser Asn Ala Val Ala Asp Val He Asn Arg Arg He Gly Val 
370 375 380 

Gly Phe Thr Ser Tyr Glu .His Thr Gly Val Pro Val Pro Leu Leu Ala 
385 390 395 400 

Tyr Gly Pro Gly Ala Glu Asn Phe Arg Gly Phe Leu His His Val Asp 
405 410 415 

Thr Ala Arg Leu Val Ala Lys Leu Met Leu Phe Gly Arg Arg Asn He 
420 425 430 



Pro Val Thr He Ser Ser Val Ser Ser Val Lys Gly Asp He Thr Gly 
435 440 445 

Asp Tyr Arg Val Asp Glu Lys Asp Ala Tyr Val Thr Leu Met Met Phe 
450 455 460 

Leu Gly Glu Lys Val Asp Asn Glu He Glu Lys Arg Val Asp He Asp 
465 470 475 480 

Asn Asn Gly Met Val Asp Leu Asn Asp Val Met Leu He Leu Gin Glu 
485 490 495 
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Ala • ' ' • 1 . • ' * 

4 97- • ' , ' . - ' . ■ 



(2) INFORMATION FOR SEQ ID, NO: 33: * ' , ' 

(i) ■' SEQUENCE. CHARACTERISTICS 

' (A) LENGTH :. 584 AMINO ACIDS 

(B) TYPE: • AMINO ACID > • 
(D) TOPOLOGY: LINEAR ' , . i ' •' 

(li) ' MOLECULE TYPE: PROTEIN . . 

(xi) SEQUENCE, DESCRIPTION: SEQ ID NO: 33 : ■ 

Met Pro Arg Asn lie : Ala Ala- Val Cys Ala Leu Ala Ala Leu Leu Gly.' 

,5 '10 . \ ' 15 

Ser Ala- Trp Ala Ala Lys- Val Ala Val Tyr Pro Tyr Asp. Gly Ala Ala . 

. .. 20' •■ ■ " ■ • 25 . ' ' ' 30.' ^ ■:■ 

Leu Leu Ala Gly Gin Arg Phe Asp Leu Arg He Glu Ala Ser Glu -Leu 

3 5 - . 4 0, 45 • - i- ; 

Lys Gly Asn Leu Lys Ala Tyr Arg .lie Thr Leu Asp Gly Gin Pro Leu 

50 . 55 , 60 

Ala Gly Leu Glu Gin Thr Ala Gin Gly Ala Gly Gin Ala, Glu Trp Thr 
65 • ■ 70 ' 75 .80 

Leu Arg Gly Ala Phe Leu Arg Pro Gly Ser His . .Thr Leu Glu Val .Ser 
• . 35 • 9° • ' 95 ' . 

Leu Thr Asp Asp Ala Gly Glu Ser Arg Lys Ser Val Arg Trp Glu Ala , 
100 • ' 105 .HO 

Arq'Gln Asn Leu Arg Leu Pro Arg Ala Ala .Lys Asn Val lie Leu Phe' 
■ 115 120 125, 

He Gly Asp Gly Met:' Gly Trp Asn Thr Leu Asn Ala Ala Arg lie He 
. 130 . ' . 135 140- 

Ala Lys Gly Phe Asn Pro Glu Asn Gly Met Pro Asn Gly Asn Leu Glu 

14 5 • ■; ■ ' 15 0 ; 155 160 

lie Glu Ser 'Gly Tyr Gly Gly Met Ala Thr Val Thr,' Thr Gly. Ser Phe 
. .. 165- 170 ^ ; ; : 175 . 

Asp Ser Phe lie Ala Asp Ser Ala Asn Ser Ala Ser Ser. He Met Thr 
.180 195 190 

Gly Gin Lys Val Gin Val Asn Ala Leu Asn Val. Tyr Pro. Ser- Asn Leu 
. 195 V ' • 200. 205 . ■ { 

Lys Asp -Thr. Leu Ala Tyr Pro Arg lie Glu Thr .Leu Ala Glu Met Leu 
210 215 220 \.' 

Lys Arg Val .Arg Gly Ala Ser He Gly .Val Val Thr Thr Thr Phe Gly 
225 > 230 235 : 240 

Thr Asp Ala Thr Pro Ala Ser Leu Asn Ala 'His' Thr Arg Arg Arg, Gly 
245 250 * . 255 
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Asp Tyr Gin Ala lie Ala Asp Met Tyr Phe Gly Arg Gly Gly Phe Gly 
260 265 270 

Val Pro Leu Asp Val Mec Leu Phe Gly Gly* Ser Arg Asp Phe He Pro 
275 2B0 285 

Gin Ser Thr Pro Gly Ser Arg Arg Lys Asp. Ser Thr Asp Trp He Ala 
290 295 300 

Glu Ser Gin Lys Leu Gly Tyr Thr Phe Val Ser Thr Arg Ser Glu Leu 
305 310 315 320 



Leu Ala Ala Lys Pro Thr Asp Lys Leu Phe Gly Leu Phe Asn He Asp 
325 330 335 

Asn Phe Pro Ser Tyr Leu Asp Arg Ala Val Trp Lys Arg Pro Glu Met: 
340 345 350 

Leu Gly Ser Phe Thr Asp Met Pro Tyr Leu Trp Glu Met Thr Gin Lys 
355 360 365 

Ala Val Glu Ala Leu Ser Arg Asn Asp Lys Gly Phe Phe Leu Met Val 
370 375 380 

Glu Gly Gly Met Val Asp Lys Tyr Glu His Pro Leu Asp Trp Pro Arg 
385 390 395 400 

Ala Leu Trp Asp Val Leu Glu Leu Asp Arg Ala Val Ala Trp Ala Lys 
405 410 415 

Gly Tyr Ala Ala Ser His Pro Asp Thr Leu Val He Val Thr Ala Asp* 
420 425 430 

His Ala His Ser He Ser Val Phe Gly Gly Tyr Asp Tyr Ser Lys Gin 
435 440 445 

Gly Arg Glu Gly Val Gly Val Tyr Glu Ala Ala Lys Phe Pro Thr Tyr 
450 455 460 

Gly Asp Lys Lys Asp Ala Asn Gly Phe Pro Leu Pro Asp Thr Thr Arg 
465 470 475 480 

Gly He Ala Val Gly Phe Gly Ala Thr Pro Asp Tyr Cys Glu Thr Tyr 
485 490 495 

Arg Gly Arg Glu Val Tyr Lys Asp Pro Thr He Ser Asp Gly Lys Gly 
500 505 510 

Gly Tyr Val Ala Asn Pro Glu Val Cys Lys Glu Pro Gly Leu Pro Thr 
515- 520 ... . . 525 _ . 

Tyr Arg Gin Leu Pro Val Asp Ser Ala Gin Gly Val His Thr Ala Asp 
530 535 540 

Pro Met Pro Leu Phe Ala Phe Gly Val Gly Ser Gin Phe Phe Asn Gly 
545 550 555 .560 

Leu He Asp Gin Thr Glu He Phe Phe Arg Met Ala Gin Ala Leu Gly 
565 570 575 

Phe Asn Pro His Leu Glu Lys Pro 
580 
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'(2) INFORMATION FOR SEQ ID- NO: 34: t , 

(i) SEQUENCE CHARACTERISTICS . " .- ' 
(A) LENGTH: 3 01 AMINO ACIDS 

{B } TYPE: AMINO ACID . . 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

Met Tyr Lys Trp He lie Glu Gly Lys Leu Ala Gin Ala Pro Phe Pro 
5 ,10 15 

Ser Leu Glv Glu Leu Ala Asp Leu Lys Arg Leu Phe Asp Ala He lie 
20 25 ?0 

Val' Leu Thr Met Pro His Glu Gin Pro Leu Asn Glu Lys .Tyr -He Glu 
35 . .4 0 4 5 

He Leu Glu Ser His Gly Phe Gin Val Leu His Val Pre Thr Leu Asp 
' 50 < ' 55 60 

Phe His Pro Leu Glu Leu Phe Asp Leu Leu Lys Thr Ser He Phe lie 
65 70 - 75 . 80 

Asp Glu Asn Leu Glu Arg Ser His Arg Val Leu Val His Cys Met Gly 
85 90 * 95 

Gly He Gly Arg Ser Gly Leu Val Thr Ala Ala Tyr Leu' lie Phe Lys 
100 105 110 

Gly Tyr Asp He Tyr Asp Ala Val Lys His Val Arg Thr Val Val Pro 
115 120 125 , 

Glv Ala He Glu Asn Arg Gly Gin Ala Leu Met Leu Glu Asn Tyr" Tyr . 
130 ■ 135 140 * 

Thr Leu Val Lys Ser Phe Asn Arg Glu Leu Leu Ar.g Asp Tyr Gly. Lys 
145 * 150 155 160 . 

Lys He Phe Thr Leu Gly Asp Pro Lys Ala Val Leu His Ala Ser Lys 
165 170 ' ' 175 

Thr Thr Gin Phe Thr He Glu Leu Leu Ser Asn Leu His Val Asn Glu 
180 185 190 ' 

;Ua Phe Ser He Ser Ala Met Ala Gin Ser Leu Leu His Phe His Asp 
195 ' 200 205 

Val Lys Val Arg Ser Lys Leu Lys Glu Val Phe Glu Asn Met Glu Phe 
210 215 220 

Ser Ser Ala Ser Glu Glu Val Leu Ser Phe He His Leu Leu- Asp Phe' 
225 230 , 235, 240 

Tyr Gin Asp Gly Arg Val Val Leu Thr lie Tyr Asp Tyr Leu Pro Asp 

Arg Val Asp Leu He Leu Leu Cys Lys Trp Gly Cys Asp Lys He Val 
260 265 270 

Glu Val Ser Ser Ser Ala Lys Lys Thr Val Glu Lys Leu Val Gly Arg 
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275 



280 



285 



Lys Val Ser Leu Ser Trp Ala Asn fyr Leu Asp Tyr Val. 
290 295 300 



(2) INFORMATION FOR SEQ ID NO : 3 5 : 

■ (i) SEQUENCE CHARACTERISTICS 

( - A -)— L - ENGTH -.~ 2 5-7-AMINO-AG-IDS-r ■ — 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Met Arq lie Leu Leu Thr Asn Asp Asp Gly lie Tyr Ser Asn Gly Leu 
5 10 15 

Arg Ala Ala Val Lys Gly Leu Ser Glu Leu Gly Glu Val Tyr Val Val 
20 25 30 

Ala Pro Leu Phe Gin Arg Ser Ala Ser Gly Arg Ala Met Thr Leu His 
35 40 45 

Arg Pro lie Arg Ala Lys Arg Val Asp Val Pro Gly Ala Lys He Ala 
50 55 60 

Tvr Glv He Asp Gly Thr Pro Thr Asp Cys Val He Phe Ala He Ala 
65 70 75 80 

Arq Phe Gly Asp Phe Asp Leu Ala Val Ser Gly lie Asn Leu Gly Glu 
85 90 95 

Asn Leu Ser Thr Glu He Thr Val Ser Gly Thr Ala Ser Ala Ala lie 
100 105 HO 

Glu Ala Ser Thr His Gly He Pro Ser Val Ala lie Ser Leu Glu Val 
115 120 125 

Glu Trp Lys Lys Thr Leu Gly Glu Gly Glu Gly lie Asp Phe Ser Val 
130 135 140 

Ser Ala His Phe Leu Arg Arg He Ala Thr Ala Val Leu Lys Lys Gly 
145 150 155 160 

Leu Pro Glu Gly Val' Asp Met Leu Asn Val Asn Val Pro Ser Asp Ala 
165 170 175 

Ser Glu Gly Thr Glu He Ala lie Thr Arg Leu Ala Arg Lys Arg Tyr 
180 185 190 

Ser Pro Thr He Glu Glu Arg He Asp Pro Lys Gly Asn Pro Tyr Tyr 
195 200 205 

Trp lie Val Gly Arg Leu Val Gin Glu Phe Glu Pro Gly Thr Asp Ala 
210 215 220 

Tyr Ala Leu Lys Val Glu Arg Lys Val Ser Val Thr Pro lie Asn He 
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(2) INFORMATION FOR SEQ ID NO : 36 :' 

(i) ' SEQUENCE CHARACTERISTICS 

. (A) LENGTH: 264 AMINO ACIDS , ( ■■ 

* (B) TYPE: AMINO ACID ' 

(D) TOPOLOGY: LINEAR, 

(ii) MOLECULE TYPE: PRO.TEIN , * 
(xi) ' / SEQUENCE , DESCRIPTION : 'SEQ ID NO : 36 : ' 

Met Glu Asn' Leu Lys Lys Tyr Leu Glu Val Ala Lys lie Ala Ala Leu 

5 • ' 10 \ 15 

Ala Gly Gly Gin Val Leu Lys Glu Ash Phe Gly Lys Val Lys Lys Glu 

' 20 " 25 .-30 

Asn lie Glu Glu Lys Gly Glu Lys Asp Phe Val Ser Tyr Val Asp Lys: 
35 40 ' 45 

Thr Ser Glu' Glu Arg lie Lys Glu Val lie Leu Lys Phe Phe Pro Asp 
50 . 55 60 ; 

His Glu Val Val Gly Glu Glu Met Gly Ala Glu Gly Ser- Gly .Ser Glu > 
65 70 - 75 80 

Tyr Arg Trp Phe lie Asp Pro Leu Asp Gly Thr Lys Asn Tyr lie Asn 
8 5 90 , 95 

Gly Phe Pro He Phe. Ala Val Ser Val Gly Leu Val Lys Gly Glu Glu 
• ' • 100 < ■ • ,105 ' ' HO 

Pro lie Val-Gly Ala Val Tyr Leu Pro. Tyr Phe Asp Lys-Leu Tyr Trp 
US • '120 . 125 

Gly Ala Lys Gly Leu Gly Ala Tyr Val Asri/Gly Lys /Arg lie Lys -Val 
,13 0 13 5, • ' 1*0 ' ' 

Lys Asp Asn Glu Ser Leu Lys His Ala Gly Val Val Tyr Gly Phe Pro 
145 150 155 -160 ■ 

Ser Arg Ser Arg Arg Asp' lie Ser lie Tyr Leu Asn lie Phe Lys- Asp 
•16 5 . ' 170 175 

Val Phe 'Tyr Glu Val Gly Ser Met Arg Arg' Pro Gly Aia 1 Ala- Ala:. Val 
. - .180 1 185 V ./ ■ 190... - -. 

Asp Leu Cys Mec Val Ala Glu 'Gly He ' Phe Asp Gly Met Met Glu , Phe 
195 200 . 205 . 

Glu Met Lys* Pro Trp Asp lie Thr Ala Gly LeuJVal He Leu Lys Glu 
210 215 '. 220 

Ala Gly Gly Val Tyr Thr Leu Val Gly Glu 'Pro, Phe Gly Val Ser Asp 
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225 230 235 240 

-lie lie Ala Gly Asn Lys Ala Leu His Asp Phe lie Leu Gin Val Ala 
245 250 255 

Lys Lys Tyr Met Glu Val Ala Val 
260 
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Pyrolopus fumarius 1A. (Iph7) 
SEQ ID NO. 37 • •' . • * _ 

1 TGC CCG AGC GTG TTG CCA AGA TGC TTG AAA GAA TGC TAT CCA AGG -CGC. AAT CTA TGC TCG 

6 0 • . ' ' 1 

61 GCG ACG-C'CC AGA GGC TTA TCC AGG AGG GTA AGG CCG TTG AGG ■ CTA' AGA AGC -TGT TAG CGG 

120 

121 CTG CTC ATA GGC TAG TAG ATC CCC TAG 'AGC ATG CTC TCG ACC ACG CCC TCA ACC ATA TAG 
18 0 ' 

181 AGC ATC ACA AGG AAC ATC ATG AGG AGC ACC ACA AGG AGC ACG ACT AAC AAC ACT CTT AGA ' 
240 ' • " 

241 ATC TCG AGA CGA ',GCT TGC TTC .CCG TGT CTC TCG CGC CTA GCC ACT TTT TAA TAG CCT AAC 
300 ' 

30X CCG AGA CCC ACA TTC CAA CAT TAC TCC GTT TGT CAC TAT CAT CTT CTA ATT GTC ACA CGC 

360 > ' .'; . 

3 61 CCC GTA TAA ATT GGG CGA CCT GGA GGA AGC GTT GCC GGT GAC CCC GCG TGC CCA AGA AGG 

420 

421 CTG TCT.CCC CAA TAT GCG GTG GCG ATG TTG AAC TAC CCG ATA ACG TAA TGG ATG, GCG AGA 

480 * ' ' ■ ' ■ ' ' / 

481 TCG TGG AGC ACG ACT GTG GGG CAA TGC TAG TCG TGA GGA TCC 'GGG ATG GCA ATG TTG TTC 

540 

S41 TAG AGC AGT TGG AGC GCG TTG AGG AGG . ACT GGG GAG AGT AGA GGC TAT GCG CAT AGC .AAT 

600 ' ; ' . . 

601 GGT TTA TGA CCA TCC GCG TGT TGA GGA GAA GAG GTT AGC TGA GGA AGC GAG GAA GCT TGG 

660 , ' " ■ 4 ■ 

661 TCA CGA ACC TGT CCT CTT TAA TAT TGA CTC GTT GCT CTT TCG CCT TGA TAG CCT GGA GCG 

720 _ ■ t 

721 CAT TCT AGG CGA TGT TGA TGT AGT ACT TCA GAG GGC GGT GAG TTA CTT CAA GGC TCT CGA 

780 ' ■ . 

* 781 GTC TAC AAG , GAT ACT CGA GGC* TGC CGG CTA CAC TGT CAT CAA CAA TAG TTT ACT GCA GCT t 

840 t ■ ■ » . . ' • • ' 

841 TAA CTG CGG CGA CAA ACT ATT GAC AAC GAT CTT GCT. TGC TAA GCA TGG TGT GCC AAC ACC - 

900 ., . * i '* . , • 

901 GCG TGC ATA CGC TGC TTT TTC GCG TGA CAC TGC TGT GCG GGC TGC, AGA GGA GCT TGG ATA 

960 ■ ' ' ' .. ' 

961 CCC CGT TGT- TGT CAA GCC CGT CAT TGC TAG TTG- GGG TAG GCT TGT GGC TAG GGC TGA. TTC 
• 1020 , ' • • ' 

1021 CAG GGA GAG TCT AGA GGC TGT GAT AGA GCA TAG AGA GGT TCT CGC CCC GGC TTA CTA CAA 
1080 ' 

1081 GGT TCA TTA TGT GCA AGA GTA TGT GCG CAA GCC TCT ACG. TGA, CAT ACG CGT ATT CGT GAT 
114Q ' ' ■ 
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1 14 l 



TGG TCA TGA GGT TCC CGT GGC GAT ATA CAG CGT TAA CGA GCG TCA TTG GAA CAC TAA CAC 



1200 



1201 



GGC ACT AGG 



CGC CAA GGC CGA GCC TGC GCC ACT GAC CCC CGA GTT ACC TGA GTT AGC GCT 



1260 



1261 TCG CGC GGC CAA GGC TCT CGG TGG CCG TGT GCT TGG TAT AGA TGT GTT TGA AGA CCC GGA 
1320 

13 21 GAG AGG CCT CCT CGT GAA CGA GAT TAA CGC GAA CTC GGA CTT CAA CAA CAC TGA GAG GGT 
1380 



1381 GAC CGG GTT TAA CAT GGC TAG GGC TAT CGT CGA GTA TGC ACT GTC GGT CGC GAA GAG CTC 
1440 

1441 AAT GGA ATG GAT AGG GTA GAG GTG CTT CTG GAT GAG GCT AGG CGT GGC GCT ATA GAG GGT 
1500 

1501 GAC GCT CGC CGC GCA TGT GAA GCG GCA TTA AGG CTG GTT GAC GTT GTG CTC CGC GAG GGG 
1560 

1561 CCT AGG GTT GCA CAG GAG TCT GGG CGT GGG ATT GAA CCC GGT GAT GTA CTA CTA GCT GAG 
1620 

1621 CCT CTG AGC TTG AGA GCA GAG CAG < GTG AAG GAG GAG CCC AAG GCG GAC AAT TGT CTG GAO 
1680 

1681 CTC CCA AAG GCT GCA TTC CGC CTC TAT AAG CGG CTC CAG GGG ATG GAG TAA AGT TCG CAG 
1740 

1741 TGT GTT GCC CGT TTT AGC CTC TGC CTT ACT TTC TAC 'TCG CGT GAG GCG AGT GTC CCT TGA 
1800 

1801 CAC GTT GCT GGC GCG AGC TGA GAA ACG ACC TCG AGA TGA TAC CCG AGA TCG TCG AGA AGC 
1860 

1861 AGA TCG AGG AGA CGA TAG TGC CGG AGG GTC TTG GCG AGC AAC GAC TTG TGT TCA TTC GCA 
1920 

1921 CCG GTG ATT CTT TCG CGG CCG CAC TTG TAG CCG AGC ATG CCC GCA TAG GCG TCG CAC GCG 
1980 

1981 ATC CTC TTG ATG TGC TAG TGG CTG GCG TTG 1 ATG GGC CTG GCG ACG CTA TAC TCC TAA GCG 
2040 

2041 TTG GTG GGC GCT CAA AAC GAG TTG TTG ACG CGG CTC GTT TCC TGT CTT CAC GTG GCT TTC 
2100 

2101 GTA TCA TAG CGG TCA CGG GTA ACG AGA GGA GTC CTC TCG CAC GCA CAG CAC ACG TTA CCG 
2160 

2161 TGA AGC TCG TCT ATT CTG ACC TCG CCT GTG GCA TGG GCG .CCG CAC GCC ATG TCG CTA TGC 



2221 



TTG CAG CGC TCT CCG CAT TGT TCA ACG CTA GAC CTC GTA TAC CCG AGA AGC TTG TTG ACG 



2280 



2281 



AGC CCC TGC CTT TCG ACC CTC AGG CTG TGT ACG CGG GTG TGG GCG TTG GTG TAG CCT CTG 



2340 
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2341 CCC TGT TCA TGG TCT TGA AGA TCT GCG AGT TGC TCG CAG ACT CCG CCA CCT GGT GCC ATC 
2 400 

2 4 01 TAG AGC AGT TCG CAC ACG CAC CTG TCT ATG GCA CCA GAA GCA ATA TAC TCG TCG TGT. ATC 
2460 

2 4 61 ' CGA TCC TCG TTG TGA GAG GAG CAC GCT AGA GGA GTA TCT CTC GGC CTT CCG GGA GGC CGG 
2520 

. 2 521 GTT TGA GGT CAC CAC TGT ACC CGT GTT GAA CGA CCC TTG GTC TAC AGC TAT TCT CCA CGC 

2 5 80 . ' ' * V . 

2 581 TAC GCT GGC CAT CTC CAG TGC TGC AGA GAG CGC CTT CAG TCG CGG CAT TGA GGA GCC CGG 
2640 

2641 ATA TCG TGC ACA TCC CGC GCT TAG CAG GCT AAC CAG GCT GAT CTA 'CCT AGA GGA GTA GAA 
2700 . 

2 701 CCT CTC GAG GAC CGG TAT GTA GTG GTC TAG AGG CTT CCC GTC ATG GTG TAT CGC GAG GCC 
2760 

2761 TAT TCC TGC TCT CCT CGC GCC TTC CAC CTT, CGG CTC ATA ATC ATC TAT GAA TGC TGT TTT, 
2820 

2821 CGC TGG. GTC CGC GCG AAG GAG TTG CAT CGC CGC CTC GTA TAT CTT TGT GTG TGG CTT GCA 
2880 • : 

2881 AAA GCC GAC AAT ATC , CCT CGT AAC CAC .' CGT ATC CAC GAG GTG GGC TAG ATC GTC ACG CTC 
2940 ^ : 

2941 TAG AAG TAG ACG TAC GCA TTC GTA GCA CCA GTT GTT CGA GAC TAT GCC GAC CAG TAT CCC 
3000 

3001 GTT TCT CTT GGC CCA TCT TAG CAG CTC GTA TGT ACC CGG TGC TAC GTA TAC GCC AGA CAG 
3060 • 

3 061 CAC AGC TGA TTG CAA TAC CCT TGC TAA TGC CTC TGC CCT TGA GGG GGT CGG CGT CAA GCC 
3120 

3121 GTG TTT TGC GAG GAG CAC GGC 1 AGC CGC ATA CAC TAT ACT TTG TTG CAC GGA GAC ATC CAG 

3180 , . " 

3181 CCT CCA CGT GTC CAT TAC ACG CCT CAC GCT ATC CyG CGT CGC GTC GGC CCC TAG GGC ACG, 
3240 t ' ■ 

3 24 1 TAG ATG TCT GGC AGC AGT CTC GTA GAG AGT CTC CTC GTA CCA CTC ATT TGT GAG GTA AAT 
3300 

33 01 GAC* GCC ACC TAA ATC CAG CAG GAG TGT AGG GTT ACG CGG CAA GGC GCC TCC TCA TGT ATT 
3360 

33 61 CGA GGA * GGC CGC CCG TTG CCA GAA TTT CAG CTA CAA CAC CCC GGA AGG GCG GGA- AAC GGT 
O420 . .. 

3421 ACG TCA ACA CCC TAC CAT CCT TCT TGA TGA GCT TCG CTA CAC CCT CGT CAA GGT TTA TCT 
3480 

34 81 CTA TCT CGT CGC CCT CCT CGG CCG CCT CCA CGA GCT CTG GGA GCA CTA TAA CGG GGA GCC 
3540 
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3 541 CGT TGT TAA TCG CGT TAC GGT AG A ATA TTC TCG AGA ACC TCT TCC CTA TGA TGG CCT TGA 
3600 

■ 36 01 CGC CTG CAG CCT TGA GAG CTA TCG CGG CTT GCT CCC TCC TAC TAC CCA TAC CAA AGT TCC 
3660 

3 661 TAC CCG CGA CCA GCA CTA CAC CCT TGG ACG CCT TCT TGG GGA ACT CCG GAT CCA GAG GCT 
3720 

3721 CCA TAG CAT GCT CGG CAA GCT TCT CCG GCT CAG TAT ATA CCA GGT AGC GGG CAG' GGA TAA 
3780 



3781 TCA CGT CGG TGT TGA TGT TAT TGC CGT AAT TGA GCA CAG GGC CCT TCA CGA CAC CCA GGT 
3840 

■ 3 841 TCA AGA GAG GTT CAC CAC AAG TTT GGC CTC GCT ATC CCA GGC TAT AAT CCA GCT GTT TAC 
3900 

3 901 TCG GCC AGC TTC ACC CAC ACA CTT TTC AAC TCC ATT ATC CTT GTA GCG CAA TCT ACC CTT 
3960 

3 961 CTG GGT ACC ACA GCG TTA AGC CCA TAG TGC CAA GGC GCC ACA ATG ATG CCC TCC GGC ACA 
4020 

4021 TTC TCG TCG GGT ATC AGC CGG AGG CGT ATG GCC CCT CTC TCC GTC TCG AGC CTA GCG TCA 
4080 

4 081 CCG GCG CCA GCC TCC TTA GGG TTC ACT CGT GCG TAT AGC TCG CCG CTC ACA TCT AGC ATC 
4140 ~ 

4141 GCG TTT GTA CAG TAG CTC ACC GGG TCT CTT GCA GTC ACG AGC ACC TTC CTA TCA CCA TCG 
4200 

4 201 GGC ACG ACC GGC TCG ACC GGC GGG TAT AGA CGG ACG CGT ATC CTC GAG ACA CCC CTG GGC 
4260 

4261 AGG AGG TAC TCG CCT CTC TCC GCA ACC GCC TTG GAG GAA 42 99 
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Thermococcus 9N-2 (31phM 

SEQ ID NO:39 



1 

60 



TGG ACT GAT AAA GAA AAA GAA GAG GTT TAA GGC CCT CAA TAT TAA ATT CTA CAC ATT AG A 
61 TAT CCA AAA TGG AG A ATT ACT TAA TCT AGA GAC TTA CCT TAA GGA GTT ACA TGA GTT CCT 



121 TAG AGG CCT TAC ATT AAA ACG AAA AGT AGA AGA GGA ACA ATG ACC CCC GAA GAG CTC CTA 
180 . 

181 ACC CGC CTC GAA TTC AAA GGA GTA ACC CTC GAA AAG ATG CTC AAT ACT, GCG * TTA GAG CTC 

240 

. 241 TAC ATC GGC GAC GAG CGC GAG AAA GTT CGA GAA AGG CTG AGA GAG CTG ATG CTG AGG TAT 
300 • 

3 01 ' CTG GGC GAC ATC AAC GTT CAA GCT CTG CTC TTT TCG GCT CTA CTC CTC GAA GAG AAC TTC 
361 AAG GTT GAG GGC GAC CCC GTG AAC CTT GTG GCC GAC GAG CTC ATC GGC ATG AAC ATC ,GCC 

4 20 

421 GAG CTC ATA GGT GGA AAG ATG GCC CTC TTC AAC TTC TTC TAC TAC GAC ACC AAG AAG CCC 
480 • 

491 GGC ATT TTA GCC GAG CTT CCG CCT TTC CTC GAC GAT GCG ATA GGG GGC TTT ATA GCG GGC 

540 

(i 

S41 TGT ATG ACA AGG CTG TTC GAG GGG GTG TAC GGT GCG GAA TCT CTT ACC CTT CTT CAC GCG 

600 ' 

601 GAT TCC GGT CAA AGG CAA CTT CAA AAG GGT TAG AAA TGA GCT CTG GGC ACT TCC CAT TCT 

660 

661 CGC ACC GGT AAC TTC GGC CCT GGC GAC GCT CGT GGG CTC TGT GCT CGC CGG GGT AAT AAT 

720 • ' . , 

721 CCT GGG CGG CAA CTA CGC GTT TCA CCC AAC GTC TCG GCA ACC CAC GTG CTG ATA ACC CTC 

780 

781 ATA GGC TTC GTC GTG GTC TAC AGC ATA CTG TTC TAC ATC , TGG CTC CAC TTC GTC AGG AAG 
840 ■ 

B41 CTC ATC AGG GAG GGC CCC GAA CCG GTT GAG GGT GAC GTC ACC GCG AAG CCC ACC CCT GCC 

900 

901 GTT AGC GCC GCG GGA GGT GGT CAG TGA TGG ACT ACG CGA CCG CAT GGT TTT ACT TCT CCG 

960 

961 CCT TCC TCC TCG GAA TGT ACT TAG CGT TTG ATG GCT TCG ACC TTG GCA' TAG GCG CGT TGC 

1020 

1021 TCG CCC TGA TTA AGG ACC AGA GGG. AGC GCG ACA TAC TCG TGA ACA CCA TCG CGC CGG TCT 
1080 

1081 GGG ACG GCA ACG AGG TCT GGT TCA TCA CCT GGG GTG CCG GGC TCT TCG CGA TGT GGC CGG 
1140 



WO 97/48416 PCMJS97/10784 

114 1 CGC TCT ACG CGA CGC TCT TCA GCA CGT TCT ACC TTG CCG TCT GCC TGC TCG CGT TCC TGT 
1200 

1201 TCA TAT TCA GCG CTG TCG GCT TTG AGT TCA GGA ACA AGA ACA AGG AGC TAT GGG ACA AGC 
1260 

1261 TCT TCG CTC TCG TCA GCG CGT TAA TCC CGC TCG TCA TCG GCG TCA TAG TCG GCA ACC TCA 
1320 

13 21 TCA TGG GAA TTC CCA TTG ACG CCA AGG, GCT TCC ACG GCT CAC TGC TGA CGC TCT TCA GGC 

13 80 : 



13 81 CCT ACC CGC TCA TCG TCG GCC TCT TCA TAC TCT TCG CGG TGA CCT GGC ACG -GAG CCA ACT 
1440 

1441 GGG GCG TCT ACA AAA CCA CAG GAA AGC TCC AGG AGC AGA TGA GGG AGC TCG CCT TCA AGG 
1500 

1501 CCT GCC TCC TGA CCG TCG TCT TCC TCC TGC TCA CAG TCA TCG GCA TGA AAA TCT GGG CCC 
1560 

1561 CAC TGA GGT TCG AGA GGG CAC TAA CGC CGC TTG GGC TCC TCC TAA CGG TTG TCA TCC TCG 
1620 

1621 TGG CAG GAC TGC TCG ACG GAC AGC TCA ■ TCA ACA AAG GGG AGG AGA ATT TGG CCT TCT ACA 
1680 

16 81 TCA GCT GGC TGG CCT TCC CGC TCG TTG TGT TCC TCG TCT ACT ACA CAA TGT ACC CCT ACT 
1740 

1741 GGG TCA TCT CGA CCA CCG ATC CGA ACT TCA AGC TCA GCA TAC ACG ACC TCG CGG CAT CTC 
1800 

1801 CGC TGA CCC TCA AGG CCG TCT TGG GAA TCT CGC TCA TCC TGG CGG TCA TCA TCA TGG CCT 
1860 

1861 ACA CCC TCT ACG TAT ACA GGG CCT TCG GCG GAA AGG TCA CCG AGG CGG AGG GCT ACT ACT 
1920 

1921 GAG TTC CCC TTT. CCT TTT TCG ATA TTC GAA CTT TTT TAG GGA AAA GTT TAT AAT TCG AGT 
1980 

1981 CAC ,CTA AGT TCC TTC TGG AAA CCT AAA AAA CGG TGG TCG AAA TGC ACA GAG GCA GAT CTA 
2040 

2041 CCG GCT GGC CCT ACG ACC GGA AGC CGG TCC TCG TCT TCT GGG AAA CCA CCA AAG CCT GCC 
2100- . . . 

2101 GGC TCA AGT GCA AGC ACT GCA GAG CGG AGG CAA TAC TCC AGG CAC TGC CGG GCG AGC TCA 
2160 

2161 ACA CGG AGG AGG, GAA AGG CCC TCA TCG ATT CCC TCA CCG ACT TCG GAA GGC CCT ACC CGA 
2220 

2221 TAC TCA TTC TCA CCG GTG GCG ACC CGC TCA TGA GGA AGG ACA TCT TCG AGC TCA TCG AGT 
22B0 

2281 ACG CCG TTG AGA AGG GCA TTC GCG TTG GTC TCG CCC CCG CTG TAA CGC. CCC TCC TGA CCC 
2340 



-78- 
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. 2341' AGG AAA CAA TCG AG A GAA TCG CCA GGA GCG GAG T.TA ACC CGG jTAa! CCA TAA GCC TCG ACA 

2400 : ' 

7401 GCC CCT TTC CAG AAG TTC ACG ACG CAA TCA GAG CCA TAG AAG GGA CGt' CGC ACA AAA CCC - 
.246 0 ' ' _ . 

2461 TCT GGG CCA TCA AGC ACT TCC TGA AAC ACG GCC TAA GCG TTC AGG TGA ACA CGG TTG TCA 
■ 2520 " , ' 

/ 252 1 TGC.'CCG AGA CCG TTG AAC GAC TGC CCC AGA TCG 'CA, AAC TGC TTA AAG ACCTCG GCG TCG, 



2580- 



1 AAA TCT GGG AGG TCT TCT ACCTCG TCC CGA CCG GGA GGG CCA ACT TCG AGA GCC ACC TGA 



25B 

264 0 ' • . - ' 

"\ 2641 GGC CGG AGG ACT GGG AGG ACG TCA CAC ACT TCC TCT. ACG AGG CCT CGA AGC ACC TCC TCG 
2*700 _ ( ( ... 

' 2701- TGA GGA CCA XCG AGG GCC CGA TCT TCA GGC GAG TCG .CGA TAA TGA GGA AAG CCC TTG AGG 
2760' *'"■*'.. • 



2761 AGA AGG GAT TCG ACC CCG ACG AGG TTC TCA AGC CCG . GGC AGC TCT ACT TCC GGC TGA AGA 



2820 



282i* AAC GGC TCG TTG AGC. TTC TCG GCG < AGG GGA ACC AGG CGA GCG CCC AAA' CTA TCG GAA CGC 
2880 ... • • 



'2881- ,GCG ACG GGA AGG GAA TAG 'TCT TCA 
2 940 t 



TCG CCT ACA ACG CCA. ACG TCT ACC CGA GCG GTT TCC 

' 2941- TGC CCT TCA GCG TCG OCA ACG TCC GCG. AGA AAA GTT TGG TTG AGA TTT ACA GGG AGA GTG 
3000 

. 3 001 AAC TTA TGA AAA AGC TCC. 'OCT CGG CCG ACT TCG AGG GGC * GCT GCG GGA GGT GCG ACT TCA 

3060 . ' 

3061 GGG AAA TCT GCG GGG GAA GCA GGG CGA GGG CCT ACG ; CCT ATC GCT TAA ACC CGC TCG CCG 

' 3120 • , 



3121 AAG ACC 
3180 



CTG CCT GCC CGT ACG AGC CGG GCT ' CAT ACC TAA GGC TCG CCA AAA ACT TCA ATC 

. 3181 TTC ACC TTC CGA TTG AGA TTT TTG GAG* CCC AAA AGC CGA TTT GAG GTG ATG GAA AT G AGG 

3240 • ' . , ' 

1 ' 3241 TGG AAG GCT GTT TTA CTG ATT GGA ATC CTC CTC GTG TCT GTC CTC GGT GCC GGA TGC GTT 
3300 • 

' 3 301 GGC TCG' AAT ACC TCA ACT GAA ACC GGC CCA TCC CAG AAG GAA ATA ACC GTG AAG GAC TTC 

3360 ■ , • '•• • 

; 3 3 61 TCG GGA AGG AAC ATC ACG GCT ■ AAA - GTT CCG GTT CAG CGG GCG GTC GTT CTC TCG ACT TCC 

' 3420 ■ ' '' -'• - - " - . ' . ■ ■ , 

3421'- GCC CTC GAA ATA ATC CAG CTC CTC AAC GCG AGC GAC CAG GTC GTC GGT ATT CCA AAG GAG 

3480 •. 

3481 GCC CAG TAG GAC GCT TTA CTG AGC GAA AGC CTG AAG AAC AAG ACC GTC GTT GGC GCG AGG 
3540 
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3 54 1* CTC AAC ATT GAC GAC TGG GAG AAC CTT TTA GCC CTA AAG CCC GAC CTA ATC ATC GAC CTC 
3600 

3601 GAC CTG AAG AAG TTC TAC AAC GTT GAC GAG CTC CTC AAC CGC TCC C.C AGC TAC GGA ATT 
3660 

3661 CCG GTC GTC CTG CTG AGG GAG GAT AAC CTT GAG GAC ATA CCG AAC CCG GTT TCG CTC CTC 
3720 

3721 GGT CAG CTC TTC GGA AGG GAG AAA GAG GCC AAG GCC TTC GAC GAC TAC TTC AAC GAG CAG 
3 7.80 ■ . 



3 781 GTG AAG GAG GTT AAG GCC ATA CCC TCA AAG ATT CCA GCG GAG GAG AGA AAG AAG GCG ATA 
3840 

3841 ATG ATA CAG CCG ATA ATG GGC AAG CTC TAC CTC GTC AAC GGC AAC GAC GTC CTT GCT CAG 
3900 

3901 GCC GTC AGG CTC GTT GGG GCG GAC TAC CTC GTG AAC CTG ACC TTC AAC GGC TAC ACT CCG 
3960 

3961 GTT AGG GTC CCG ATG GAC GGG GAG AAG ATA ATA GCG AAC TAC CGC GAT GCA GAC GTC GTA 
4020 

4021 ATC CTC CTG ACG AGC GCC GTA ACG CCT TAC GAC CAG GTC GAG AAG CTC CCG GAG GAG ATG 
4030 

4 081 CTC AGC GAC GAG GCC TGG AGG GGC ATT AAG GCC GTC AGG GAG GGC AAC GTA GTA ATC CTC 
4140 

4 141 AGG GCG GAC ATG GGT AAA GAC TCC TTC CTC CCC TGG AGC CCG CGC TTG GCA GTG GGA ATC 
4200 

4201 TGG GTC ATT GGA AAG GCA ATC TAC CCG GAC TAC TAT CCT GAC TGG 'AAC GAC AAG GCC AAG 
4260 

4261 GAC TTT CTG AAG AGG TTT TAC GGC CTC TCC TGA TTT TTC TTT TCG GGT GGG ACG ATG ATA 
4320 

4 321 GCG GTC TTT CCA GCG ACT CTC GCG GAA ATC GTC AAA CTC GTC GGG AAA GCC GGG GAG ATA 
4380 

43 81 GCC GGA GTG AAC GAG GAA ATC AGG TTC GAC CCC TGC CTG CCG GAG CTG AAG GAT AAG CCT 
4440 

4441 GTC ATC GGA AAG TAC CTC AAG CGG AGC AAG AGG ACC TAC TGG GAC GTT TTA GAG GAG CTT 
. 4500 . _ _ ... ... , _ 

4 501 AGG CCG GAC CTT ATC CTC GAC TTC GAT GTT GAG AAC CTG CAC TCC GGG GAC GAG CTG AGG 
4560 

4 561 GCC TTT GGG GAG CGT ATA GGG GCA AGG GTC GAG CTG ATT GAC TTC GAG ACC GTT GAA GGC 
4620 

4621 TTC GTC GAG GCG AGC AGG AGG ATA GCC GAG CTA ACG AGG GGC GAC TTT TCA AAG CTC GGC 
4680 

46 81 GGG TTC TAT GAG AAG CAC CTG ACG AGG CTG GGT GAG ATA ACT GAA GCC ATC GAG GAG AGG 
4740 
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4 74 1 CCT AAA GCC CTC CTC ACC TAC CGG AAC TTC AAC GTC GTA ACG AGG ACC AAC CT7 CTC AGC 
4800 • 

4 901 GAC GCG GTT AGA AAA GCA GGG GCG ATG AAC CTC GGC GAG ACG ATA CGG ACA AAG CGG AAG 
4360 

4 861 GTC TAT CCG GTA AAG AAG GAG CGC TTC TTC AGG TCC TTC GGC GAT GCG GAG CAC CTC TTC 
4920 

4 921 CTG CTC ACG AGC ATA ATG ACG GAC AGG GAG AAA ATG ' GAG GGG ATA AGG GAT GAA ATC CTT 
4980 



4 981 GAC TCG GCC GAG TGG AGG GCA ATG GAA GCC GTT CAG CTC GGA AAC GTG CAC ATA GTT GGC 
504 0 

' 504 1 TCG GCC CTC GAC CTT GAG AGC TTC ATG CGC TGG ACT CCC CGC ATA ATC CCG GGA ATC TAC 
5100 

S101 CAG CTT GGA AGG TTT ATA CAC GGA ACA AAT CAC CCA CGA ATC TCG TGG AAA TCA CTG CAA 
5160 

5161 AAG TTT AAA ATC CCC CTC CCA CCC CTC GAA GAA CAA AAA CGC ATC GTC GCC TAC CTC GAC 
S220 

5 221 TCG ATA CAC GAG CGC GCC CAA AAG CTG GTA AAG CTC TAC GAG GAG CGG GAG AAG GAG CTT 
5280 

52 61 GAG AAG CTT TTC CCC GCG GTG CTT GAT * AGG GCG TTT AGG GGT GAG CTG TGA TTC CGG GAA 
S340 

5341 TGG AAT ACG CCT TTG AGA GGG CAA TCT TTG AGA TAG TCA GCG GCT TTG TTC TCT CCC TCG 
5400 

54 01 TAG TCA GGG CTT TCG CTT ACA GTT TTG GTC TTC CAT GGG TAT CCT TTT TGT TCA ACG TTC 
.5460 

5461 TTT CGA TAC TTC TGA CAA TAG GCC TGA TTG ACA AAA TGC CCT TCT GGT CCA TGT CAT ATC 
5S20 
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• ' 0C1/4V ( 3 3phir .■; ■■' - ■ ' 

SEQ ID NO 39 

1 ' AGC TTG GAT ATC GAA TTC CTT ATA TGA AAA ATT CAT CCA ATT GGT AAA AAA' CCA CGA TCT 

60 

■ 61 TCA TGT GGA AAc' TGG AAT ATT TGC TCC GCA TAT GCT TGT GGA AAT ACA TAA CGA ' TGG- TCC 

120 

' 121 ■ GGT GAC TTT GTT ' ACT 1 TGA , TTC AAG AAA AGG' TAT TTT GAA GTC. ATC.,TTT " GCT GTC TCT AGG [ 

180 • • ' " v ' ' , , _ 

181 AGG ACT ATA TGC CTG AAT ACT CGC ATA GCA ATA" AAA ACA ACT TTT, TTG CCG AAA ACG ATG. 

' 240 .. "■ _ ( 

- 241. > TGA AGA ATT GTC ATC TAC TGC ATG TAT GTT GTC CAC CCG ATT 'TGG- CAA TTT CTT ATT TGT ■ 

300 

1 301 CCG GTG CAC -GTC GTG ATA TTT TCT TTT ACA ATC CTA ACA ' TAC ATC CAA AAG CTG AAT ACG 

36 0 _ ; , 

361 AGA AAC GAC ACG CCG AAG TCA TTA AAA TTG CTG " CAC TCT TTA AAA TGA ATG TTC TGA AAG 

42 0 1 ; 

4 21 TTC CTT ATA ATC CTG ACC TGT TCT TCA AGC TTA CTA AAG GAT TAA AAA ATG AAC CTG. AAG 
480 • 

481 GCG GGA CAA COT GCC AGA TTT GTA TAA GAA TGC GAC TAG AAA AAA CAA TGG AAT ' ACG CGA 

S40 . ' ■ ■ ' t ; 

541 AAG AAA ATG GCT ACA AGA GTG TTT CCA CAA CGC TAA CAG CCT CTC CAA AGA AAA ATG TAG 

600 



601 CGA TGA TTG TGA AGA TAG GAA AAG AAC TGG AAA • AAA AAT ACG GTG TGG AAT TTT TGC CTA 



660 



66 



1 ATG TGT ACC GCA AAA GTC CGC TTT ACA ACG ATG "CGC AAA AGC TTA TAA CGA AAA TGG GTT 



720 



780 



840 



721 ATT TAC AGA CAA. AAC TAC TGT GCT TGT ATT TTC TCA ATA AGA ACT TCc'cTT ATA GTA GCC 

7B1 ACT CAA GAA ACT AAA ACC GTA AAA AGT GGG GTC GAA GTA TGA AAA TAT ACC ACA AAT TAG . 

14V AAG AAG TTG AAG AAC ' ATA AGC GGT COT. ATG 'CAT CAa'ttG CTT TTT CAT CCA AAG TCA GCG 

900 

901 TTG AAT ATG AAC ATG CTG GCG AAA AAC TTG CCC TCA TCC CTG TAX CTA 'TTG GAG ACC TTA 

960 • 

TTA TCG AAA TTG ACG ATG ATA GAG AAG TAT TCA ATA CTT TGT TGA ACG AGC ACA 

1021 TCA AAA ACT CTA TCC TGA AAC AGT TTC CGt' ATC CGG AAG AGA TTA GAG GGT TAG CCA GAC 
1080 

1081 ATT TTC GCA CAG AAT TGA AGA ATT TCA GAA TCT TGG TTG TAA AAT ACA ATA GTG TCG AAG 
1140 



961 CGG TGG 

1020 ■ . ^ 
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1141 AAA AGG AAT TCT CAA GGT ATT CAC TGT CTA ATA TAA CAT TCG GTG TGG TCT CAT ACA ATA 
1200 

1201 AAT TTG ATG TCC ATT TGT TAC CAA GTA ATG TAA AAG TCA CAC CGA AGC CAG GAT ACT GTC 
60 

1261 TTT CAC ATG TTG TCC AAA AGC CTG AAG AAG GTA TCA CGC AAG CAT TCT TGT TAG CCC GGT 



1260 



1320 



13 21 GGT TTG GTG GTG GAA GCT ACG ACC AAC TGC CCA AAT TAG CGC TTG AAA GCA CTG ACA TTG 
13B0 



1440 



1500 



1560 



1620 



13 81 ACC TTG GAA ACT GC-A CAA ATA TAG TCA AAT ACA TCG TTC TCT CAG ATT TTG AAA AGA GGT 

0 

14 41 ATT TTT CTG GTA TAA TAA AAA AGC TAA ACG AAT TTA GAA GCG AGA CAT ATT TTG ACC CAT 
0 

15 01 TTG CTA GGC TTG AAA TGA TAT CAC TTG GCA TAA TAC TCG CCA AGT CAG AGG GAG GAG- GTA 
0 

15 61 ACT TTG AAC CAG ACA GTT ACG ATA TCA TTT AGA GCA CTT ACT GAA AAT ATA AAA TTA GCA 

0 

1621 CGA GTT GTT ATA CAT ACT TTT CTA ACA TTC CGA GGA CTG TTC GAT AAA GAT ATA TTC CAT 
16B0 

16 8 1 ACG GAA TTG GCT GTA AAC GAA GCG ATT GCA AAC ATT ATT CAC CAT ACA TAC AAA GGT GAA 
1740 

1741 CCA AAC TAC GTT GTG ATG ACG CTC AAT TGG ATA GAA CCA GAT ACA CTC GAA GTG TTA CTC 
1800 

1801 CGC GAT TTT GGT CCA AAA CTG GAC CCA ACG AAA ATC AAA CCA CGA GAT TTA GAT GAT ATC 
i960 

1861 AGA CCA GGA GGA CTC GGA GTT TAT ATA ATT CAA CGC ATC TTC GAC ATT ATG GAA TTC CGA 
1920 

1921 AAC GTG AGT CAT GGA AAT TTA CTT TAT CTA AAA CGC TCC TTC TTA ATA CCT CCT AAA AAG 
1980 ' 

1981 CAG GAG CTT GGG AAT TTA AAT AAT GAA CCC TAT CGA GAA TAT TGA AAA AAC CGT CAA AAC 
2040 

2041 GGG GGA AAG AAC ACA AAT GGG CTT CCT CAC AGG TTT GAC AAA AAA TCC ATC TTT CAT GTC 
2100 

2101 TGC ATT TTT TGG CTT TTT GGC AGC ACA ATT TTT -GAA AGT GGT GAT ATA CAA AGA TTT CCG 
2160 

2161 CGT ATT TGG TAG ATA CGG TGG TAT GCC CAG' TGC TCA TGT TGC AAC AAC CTC AGC ATT AGC 
2220 

2221 TTG GGC TGT TGG TTA CAC TAC AGG TTT TGA TTC ACC GCT TAC AGC CAT CGC TGC AAT TTT 



2280 



2281 CCT TGC TAT TAC AAC AGC TGA TGC TGT TGG TTT ACG AAG AAA TGT CGA CCC CAA TAA ACG 
2340 



•83- 
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2 341 ' ACA TAC ACT AAT GGA AGC TAT ' CTA TGG " CTT CTT ACT TGG • GTG GAT AGT CCC TCT, CCT TAC, 
2400 , 

2 4 01. CGT TAA GTT GTA TCC ATA ATT TTG ■ AAT GAG * TTC TAG TGA AAT AGC XA AGT C7~ TTT TCG ''■ 
,2 4 60 • . 

•2461 CAA TTA CAT CAT AAT GCC AGG AGG GTA ATT TAC AAT GTT TTT TAG ATT ACC ATT TAA AGT 
2520 . • 

2521- TTT TGT TTT TGC AGT TTT GTT GCT TGC CAT CTC GTT AAC AAG TGT TGT TAG TTT TGG ACA 

2 5 80 ' t . * . ' ■ , ■ 

25 81 AG A TGA ' TGA GCA GAT AAA AAC ACC AAA TTG GTT TAG AAG TGC GGT GAT TAA GAA AAG AGC 
2640 

2641 TGG TAT GAA TCT AAA GAC CGC. CCC AGA GTT TGT AGA TGA CCT ATC GAA TGC GAT ATA CAC 
2700 

2 701 TAT AGG CAC AAA ATA CAA CGT TCC CCC .AAC GCT TAT ACC CGC TGT CAT TTC TGT AGA AAG 
2760 ' 

1 2 761 CAA CTT CGC CAA CGT GAA AGG TGC TGG AGA CGT GGT AGG AAT GAT GCA AAT TTC TAT CTC 
2820 > 

2821 CAC AGC CAA AAA TAT ATC GAA ACT CCT CGG CCT CGA ACA ACC AAA AAA CGC TTG GGA TGA 
2880' ' ■ - 

28 81 GCT CCT CAC AAA TTA TTG CTT GAA TAT AAC TTA CGG TAC CCC ATA CAT CCC TTA TCT TTA 
2940 

2941 CAA AAA GCA TGG AAC TTT ACA GAA AGC GCT CGA AGA ATA CAA CAA CGG ; AAA AAA TAA AAC 

3000 : • ; ■ 

30C1 . TAA ATA CGC CCA GCT GAT ACT ACA ACA ATA CAA CCT ATA CGA GAG CCT CCA TTC TGC TGA 

3060 ' , • ' 

3 061 AAT AAG AAA TAA* CCA GCA ATT GGA TAC AGA TAA TTC TTC GAC. ATC TTC TGA AGC AAC AGA 
3120 ' , , ' . r . 

'3121 TAC TTT GAA TAC AAC CAG , TGC AAC AAA' TTC ACA ACC AAC ATC AGA TGC ATC AAA. TAC ATC 
3180 \ 

3181 AGT TAA CAC TTC AGA AAT CAA GTT CCC GCC TCT TTT CGG AGT TGC AGG TTA TTA AGA TAT ' 

3240 ■.,■..-/'■■ * . , . . 

3 241 TTG TTC GGT ACT TAC TTA GGA ATG TGG GGT GTA TAG TTT GGA AGA TGA AAA AAT GAA ACC 
3300 

3 301 TGA AAC GAT AGT AAA AAT TGA ACA TTT ATC TTT TTC TTA CCC GAG- TTT CAG TCT CAA AGA . 
v 3360 , ■ < ■, . . . ■ 

3 361- TGT AAG TTT TGA GGT TCG GAA GGG-AAG TTT CTT CGG CAT TAT TGG ACC AAA TGG TTC GGC - 
3420. , * , .•• . . .„ ,.■ 

3421 • AAA AAC CAC GCT ACT CTC ACT CAT TAT GAA ATT CCA AAA GCC AAA AAG TGG GAA AAT AAC 
3480 ' 

34 81 AGT TGA TGG GAA CGA TGT GCT CAG GCT ATC TCA CAA AAA ACT TGC ACA ACT TAT AGC* ATA* 
3540 



WO 97/48416 PCT/US97/10784 

3 54 1 CAT CCC TCA AGA CTT TAA CCC TAC ATA CGA TIT CAC ACT TGA AG A ATT GGT CGA AAT CGG 
3600 

3 6C1 AGG AAT CCC CCG CTC ACC ACA TTT TTT CGA AAC ACC TCT TTA CGA GGA AG A ATT AG A AAA 
3660 

3 661 TGC ACT CAA AAC TGT TGA TTT GCT TGA . ATA CCG AAA AAG AAT ATT CTC CAC TCT TAG TGG 
3720 

3 721 AGG ACA ACA GCG CAG GGT CTT GAT TGC ACGCGC AAT CTA TCA AAA CAC ACC TAT CAT CAT 
3780 



3 781 TGC TGA TGA ATT GGT TAA TCA CTT GGA TTT AGG CCA AGC AAT TAA AGT GTT AGA TTA TCT 
3840 

3 84 1 AAA ACA ACT TAC CGA ATG TGG AAA GAC GAT AAT TGG ACA TTC CAC CTG CAG CCC GC 3 8 96 
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. r , Archaeoglobua* U thotrophicus TF2 (Sphi) 

SEC; ID -NO : 4 0 1 . ' ' , ' < ^ ' ' : : . 

1 ATG TCC TGC AAG GCG ATT AAG TTG GTA ACG CCA GGT TTT , CCC AGT CAC GAC GTT G7A AAA 

6 0 ' ' . ' . ■ 

61,.' CGA CGG . CCA GTG AAT TCT . AAT ACG ACT CAC TAT AGG GCG AAT TGG GTA ■CCG GGC CCC CCC 
120 * v 

121 TCG AGG TCG ACG GTA' TCG ATA AGC TTG ATA TCG AAT TCC GTA CGA AAT GCG CGA AAG AGA 

180 : . \ - ' " ' • ' ' - 

131 GAA GGA AAA GGA AAG AGA GCA CAG ATT TCG AAA TGA. CAC AGA ACA CGA CCA AGA .GCA TGG 
240 , ' • . . ■ ' , 

241 TAT GGC AGA GCG TGA AAG AGC ACA TGA GAA CGA GTC TGA AGA AAT GGG CAA GGG CGT TGG 

300 . ' '■ • . / 

301 . CAT GGG CGC CCA TGG AAT GAA GAT GGG CAA' AGA AGC TCG CGA AAT GGT GAA GGA AGA ATA 

360 •• ' ( ; 

361 CAA ' GGA AGC AAA. GGA GAG ATA CAA GAA GGC .TAG AGA AGA GTT TGA AAG AGC AAA' GAA GAT 
420 •, • 

421 GGG, ATT GGA CAT 'CAC , AGA GGA GCG CGG ATT CAA GAT GGC CAA GGG ATT CAT. GGT AGC TGG 
. 430 > 

481 , ACT AGA CGT TGC TGA GAT GTG GCT GGA GAG ACT GAA .GGT ACA CGT CAT GAA TAT GGG TGA 
540 . t ■■ 

541 AGA GGC CAA GAT "CAC; AGA GGA GAC CAA ACT GGA GCT GCT CGC AAA. GAT CGA CGA GAA" GCT 
600 " 

.601 TGC AGA AAT CAA AGA GCT GAA GAA CGA AAT CAA TGA GAC CTC CTC ACC TGA AGA GCT GAT 
660 . 

661 AGA AAC TGT CAA GAA AAT CAG AAA GGA GTG GAG AGA AAT CAG AGA TGA AAT GAG GGC TCT 

' 720 • . ( • ■ ' 

721 "TAC TGG CTA TGT CGC CGT TGC CAA GGT GGA AAA GCT TGT TGA AAA GGC CAA GCA GGT AGA. 
780 . , V • 

7B1 'GCT AAT GCT 'TGA GGC AAA GAT CGA GGA GCT CGA TGC TGC AGG AGT TGA TAC AAC CAA ACT 

840 

841 CGA GGC AAC' ACT CGA GGA CTT CTC GGC AAA GGT TAA TGA AGC AGA AGA TTT GAT TGA CAA . 
900 ■ 

1 ■ i 

9Q1 GGC TGA AAA TCT GTT CGA GGA AGG CAA CAT' TGC TGA. AGG ACA CAT' GAC TCT CAA GGA AGC 

960 / ' • • '. 

961 CAT AAA GAC TCT CAA GGA AGC CTT ' CAA GGA . TGT CAA GGA AGT TGT CAG CGA GAT GAA GGA - 
1020 ' ' ; ' . * • • 

1021 AAT GAA CCA GTA TAG AGT TAG GGA GGG CAA GAT CTT CTA CGG- AAA .CGA GAC TGG 'AGA AGT 
1080 ' ' • ' '• 

1081 CTG GGT GGA TGG TAA TGG TAC TGC TGA GTT TAA CGG TAC. CGG TAT CGT TGT GAT CAG AGG 
1140 
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1200 



1260 



1320 



1380 



1141 AAA CGC AAC ACT TCA GGT CGC ACC AGA AGA TCC GAT CGT GAC ACT GCT CGG CTT CGG CCT 

2 01 GAA GAG CGT TGA GGG TGG CGT TTC AAG ACT CAG CGC AGA AGG TAA GGC ACT AAT CAC ACC 

1261 AGA AAA CCT CAC CGT CAA GGT GGA AGG TGA CGA CTT CAA GCT CAT AGT GAA GGG CTA CGG 

0 " . 

13 21 TAC ACT CAA ACT CGA TGG TGA GGG TGA ATA CAG GGT AAA GAA GAG CCC ACA GGA AGA GAT 



13 Bl GAC ATT TAA ACT CTT TCT TCA ACT CTA GCA GTT TGA GCA TTC CAT TTC CAA GAT TTT TGC 

0 

1441 TGT TAG CTT CGG GAC AAC TTT GAA AAT ACG TCG AGA CAG GCT CAA ATG TTG TCC CAG CAT 

0 

1501 TGC AGC TTT CGG CAA AGC GAA CGA GAT TTG CGT TCC GCT CCC CAG CCC AAC ATG GCT TCT 

0 

1561 GTA ATC TGA AAA AAC TTC AAG TTC AAC AGC TTT CCC AAA AAC ATC CAA AAG CTT TTC CCC 

0 

1621 AAC ACT TCT AAA TCT TTC GAG ATT TAT TGC ATT TCC TTT CAC CGA AAT GCT ATC GGA TTC 
0 

1691 TCT TCC CAC AAC CTC GAT ATG CGG CTC TTC CAG AGC AAT ACC CAC TCC ACC GTC AAT CCT 

0 

1741 TCC AAC, CTG GCC GTT CAA ATC AAT GAG CGT GAT ATG AAT TCT CGA CGG AGT TTT AAC CTT 
0 

1B01 AAC ATA CAT CTA TAG AAT TTA AAC GGT AAT TAC TTA AGA AGT TTT GGT TTT GCG AAA AAG 
0 

1B61 AGT TCA AAA TTC ATT CTT TTA ACT GCA CTA CAG CTC ATC TGT GCC TTT TCT CCT TAA TTC 

0 

1921 GAT TTT TCT GAG ATA GTT CTG GTA TCT CGT ATC AAC TAT GTA AGC CTC GGG AGC TAT TAC 
0 

19 81 AGG CAG ATG ATA ACC GGT GAA TAT CCT TAT TAT CTC TCC AGC CTG AAC CGA GCA TGT CAG 
0 

2041 TGC ATA TGA TAT CGG ATC GTG ATC GAT GTG AGG ATA CTC CAC CTC GAA GAA AGA CAC ACC 
2100 

2101 ATC AGG CAG GAA AGT AGT AAT TAT ATC GGG AAT AAA TGG AGC TCC GAG CTC TTC AGC AAC 
2160 • 

2161 TTT TGC AGC CAT TGA AAT CTG CTT ATG AGC AAC AAC AAC ATC AAT ACC TTT CAA CTG TCT 
2220 

2221 CCT GAG TTC TTT ATA ATC ATG CGG GAA GGG ATA AGA GAT TAT ACA CGA ATC AGA ACT CAT 
2280 

22B1- AGG ATG CAC AAC ATC ATA ATC GTT TCC CTC AAG TGG CTT TAT GCT GGC ATC AAG CCT CAC 



1440 



1500 



1560 



1620 



1680 



1740 



18O0 



I860 



1*20 



1930 



2040 



2340 
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23 41 ATC CAT TGG TGT. AAC TAG ATC TCC AAT ATA CCG AAT GCA ACC AAC ACC ACT TCT CCA GAG 
2 4 00 

24 01 CAA TTC CAT GAG CAT TCT GCT TCC GAT GAC AGC GAC ACT AAA GTT CCT GAG ATA ATC TAT 
2460 

2461 CTT TTC TTC ATC TGC CAT CCC ATA CCA GGA AAT TTT- TCT CAT GGC AAT AGC CCC GCA TCC 
2S20 

2521 ATT AAA TGG TAT TAA TTT TTT GCC GTA TTT TGA GGA GGT AG A TAT TAA CCA ATT ATT TTC 
2580 . 

2581 AAA CCA TTT AAG GGC ATC GAT GAA ACA TCC CAA AAC CAG TTC AGC AAA AAA TTA AAT CAC 
2640 ' 

2641 ' TGC CAC ACA TTC AGG ACC CCA AAA TGG TGT GAG AAA . TGG ACG AAC TGG GAG GAG TTA TTT 
2700 ( ' 

2701 TTG ATC TGA TAG AAG AGG AGC CCG AAG TTG AGG AGG ACG ACG AGA TTA AGC TCG CAG AGA 
2760 

2761 TAT ACA GGC TTG CTA CAA AAC TTA TAA AGT TAC TCG AAG ATC TCA AAA GCC ATG AGC TTA 
2820 

2821 ■ AAG AGT CAG CAT CTC TTA TGC TCA TAA AGG AAA TTA TCG SVZ AAG ACA GAG TTC TGG TTG 
2860 

2 881 GTT TAG. CAT CAA AAA TGC TCC AGG ATA TGA GTC TCG GGT TCG' AAG AGG ACG AAA AGT ACG 
2940 

2941 TTT CTT GAT TTT TGA ACT GTA TTT TCT ACA TGC TCT TTT CCC AAC CAC ATT CAG TTG CAT 
3000 ' t • 

3001 GCC ATA CGA AAA TTC CAA TGC CCA AAT CCT GGT AAA TGT ACT TTT TCA TAG TAA ATG CTG 
3060 

3061 CCA AAC CCA GAT TAA ACT CAA TTT CAT CAA CAG GAA AAA GAA AGA ACG AAA AAA AGA CCT 
3120 

3121 ACA ACA GTC CTA TAA TTG ACC AAA CTT GAT AGA TTA CAA ACA CCA CAG TTC GAA TCA AAG 
3180 

3181 CAC AGA TGA AAG CTT TCC GGA TTC CTG CAG CC 3212 
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Mcihanococcus ihermolithoautoirophicus SN! {Uphl) 



Nucleic acid-SEQ ID NO:4L 
Amino acid-SEQ ID NO:42 



1 ATC GAA ATA ATA AAC AAA TTT CTA AAA AAA ATT CGA TAT AAG AAA GAT GGA GAA CAA AAA 

60 1 Met Glu He He Asn Lys Phe Leu Lys Lys lie Gly Tyr Lys Lys Asp Gly Glu Glu Lys 
20 

61 AAG GAC AAA TCT AAA ACC AAA ATA AAA ATT GAA GAA GAA AAA ACC ATG GAT ATC GAA ATT 

J^L 2 -1 — ? . y ^A. ;r ,T.yq,s ff r„Ly 3 -.Thr_Ly5Llle Lvs lie Glu Glu Glu L y s Thr Men Asp lie Glu He 
40 

121 CCA AAA ATT GAA CCT ACT GAA AAT TTT AAT CGT GAT GAA ATT GTT TTT GAG GAA CAT" AAT 

180 

41 Pro Lys Tie Glu Pro Thr Glu Asn Phe Asn Arg Asp Glu lie Val Phe Glu Glu Asp Asn 

60 

181 GCC TAC GGT ATA TCC CAC AAA GGA AAT AGA ACA AAC AAC GAA GAC AAT ATT TTA ATT AGA 

240 

61 Ala Tyr Gly He Ser His Lys Gly Asn Arg Thr Asn Asn Glu Asp Asn He Leu He Arg 

80 / 

241 AAA ATA AAA GAT ACC TAC ATA TTA GCA GTT GCA GAT GGT GTC GGA GGG CAC AGC TCA GGA 

300 

81 Lys He Lys Asp Thr Tyr He Leu Ala Val Ala Asp Gly Val Giy Gly His Ser Ser Gly 

100 

3 01 GAT GTT GCA TCA AAG ATG GCA GTG GAT ATT TTA GAA AAC ATT ATC ATG GAA AAA TAC AAT 

360 

101 Asp Val Ala Ser Lys Met Ala Val Asp He Leu Glu Asn He He Mec G^u-Lys Tyr Asn 

12 0 

361 GAA AAC CTA TCA ATT GAA GAG ATA AAA GAA CTT TTA AAA GAT GCA TAC ATT ACG GCA CAC 

420 

121 Glu A3n Leu Ser He Glu Glu. He Lys Glu Leu Leu Lys Asp Ala Tyr He Thr Ala His 

140 

421 AAC AAA ATA AAA GAA AAC GCT ATT GGA GAT AAA GAG GGA ATG GGA ACA ACA CTA ACA ACT 

48 °141 Asn Lys He Lys Glu Asn Ala He Gly Asp Lys Glu Gly Mec Gly Thr Thr Leu Thr Thr 
160 

4 81 GCA ATA GTT AAA GGG GAT AAA TGC GTT ATA GCA AAC TGC GGG GAT ACT AGG GCT TAT TTA 

540 

161 Ala He Val Lys Gly Asp Lys .Cys Val He Ala Asn Cys Gly Asp Ser Arg A^a Tyr Leu 

180 

S41 ATT AGA GAT GGA GAA ATA GTT TTT AGA ACA AAA GAC CAC TCT TTG GTT CAG GTT TTA GTA 

6 °°181 He Arg Asp Gly Glu He Val Phe Arg Thr Lys Asp His Ser Leu Val Gin Val Leu Val 
200 
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220 



601 GAT. GAA GGA CAT ATT TCA GAG GAG '• GAC GCA'AGG CAT CAT CCA ATG AAA AAT ATC ATT ACC 
201 Asp Glu"Gly His lie 'Ser Glu Glu Asp Ala Arg His His Pro> Met Lys Asn lie lie Thr 



720 



661 TCA GCA TTG GGA TTG GAT GAA'TTT AAG GTA*' GAT GAT TAC GAA TGG GAT TTA ATT' CAT GGT 
■221 Ser Ala Leu Gly Leu Asp Glu Phe Lys Val. Asp Asp Tyr Clu Trp Asp .Leu He Asp Gly 



730 



721 GAT GTA TTA TTG ATG AGC TCC CAT GGG CTT CAT GAT TAT GTC AGT AAG GAA GAT ATT TTA 
241 Asp Val Leu Leu Met Ser Ser, Asp Gly Leu His Asp Tyr Val Ser Lys Glu Asp He LeuV 



84 0 



781 AAA ACT GTA AAA AAT AAT GAT CAC CCA AAA GAT ATT GTA GAT GAA TTA TTC AAT ACT GCA- 
26 1 Lys Thr Val Lys Asn Asn Asp His Pro Lys Asp lie Val Asp Glu Leu Phe Asn Thr Ala 



841 TTA AAA GAG ACA AGG GAC AAT GTG AGT ATT ATT CGT ATA 87,9 1 
281 ■ Leu Lys- Glu Thr Arg Asp Asn Val Ser He He Arg lie 293 
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Pyrolobus fumarius 1A (Iphl) 



SEQ 10 NO:43 -Nucleic acid 



SEQ ID NO:44-ammo acid 



840 

280 



1 ATC ACT CTG CTA GCC CTG TAT CAG AAT AAA CGT GTT ATC GTC AAG CTT GGC TGC GGG ACC 

I Met Thr Leu Leu Ala Leu Tyr Gin Asn Lys Arg Val lie Val Lys Leu Gly Trp Gly Ser 

61 GGC ACT AGC CAA ATA ACT AAC GAG GCG CAA GTG CTG AGC CTA TTG CAC CAT ATG CCT ATA 

21 Gly Thr Ser, Gin lie Thr Asn Glu Ala Gin Val Leu Ser Val Leu His Asp Met Pro He 

121 GTG CCC AGA CTG CAT ACC CGT CTA GAC TTA GAT GAT GTC AAG CTC GTT GCG ATA GAG TAC 

41 val Pro Arg Leu His Thr Arg Leu Asp Leu Asp Asp Val Lys Leu val Ala lie Glu Tyr 

181 ATA CCC TAC AAG AGC CTT AAC GCC GTC GGC CCC TTG AAC CCC CTT AAG GCT GTC ACA GCC 

61 He Pro Tyr Lys Ser Leu Asn Ala Val Gly Arg Leu Asn Pro Leu Lys Ala Val Thr Ala 

24 1 GTC TTC TAT ACA CTC GCA TCG CTA GTC CAT ATC CAC GGC CGT GGT TTT GCT CAT TCC GAC 

Si Val Phe Tyr Thr Leu Ala Ser Leu Val His lie His Gly Arg Gly Phe Ala His Cys Asp 

301 CTA AAG CCG GGT AAC GTT ATA CCA GTT CCC AAG CGT GGC ATG GTG TTC ATC_CAC TTT GGT 

101 Leu Lys Pro Gly Asn Val He Pro Val Pro Lys Arg Gly Met Val Phe lie Asp Phe Gly 

361 GTT GCA CGA CCT TTT GAC GCT GCG GGC TTC GCG GCA GGA ACA CCA GGG TAT ACG TGC CCA 

121 Val Ala Arg Pro Phe Asp Ala Ala Gly Phe Ala Ala Gly Thr Pro Gly Tyr Thr Cys Pro 

421 GAG. GCT CTC GGC GGC GAG ACC CCC GGC TCT GGC TGC GAT CTC TAC AGC CTT GCC GGC ATA 

141 Glu Ala Leu Gly Gly Glu Thr Pro Gly Ser Gly Cys Asp Leu Tyr Ser Leu Ala Gly lie 

i 

481 TAC TAC TAC TTG GTT ACC GGG TTA AGC CCG CCA CGC GAC CCA- AAA GAG TTC CCC AAG GCG 

'l61 Tyr Tyr Tyr Leu Val Thr Gly Leu Ser Pro Pro Arg Asp Pro Lys Glu Phe Ala Lys Ala 

i 

S41 CTC TCG TTG- GCT CCC GCT CCA ACT AGC CTC TTG GAA CTG TTC ACA CAG CTG GTG CTG GAT 

\ai Leu Ser Leu Ala Pro Ala Pro Ser Ser Leu Leu Glu Leu Phe Thr Gin Leu Val Leu Asp 
> 

601 CCC GAG TAT CGT AAC AGC CTT GAT CCT CTC CAG CTG TTG AAG ATT GTT GCA TCT TTT. AAC 

\oi Pro Glu Tyr Arg Asn Ser Leu Asp Pro Leu Gin Leu Leu Lys He Val Ala Ser Phe Asn 

3 

SSI CCG CAA CTG CTA GTC CCT CAT ATC GTT ATA GAT GGT GTT TAC" AAG CCG~ CTA GGT TAC " GGC ~ 

°221 Pro Gin Leu Leu Val Pro His He Val He Asp Gly Val Tyr Lys Pro Leu Gly Tyr Gly 

0 

721 GAG GTA AGC ATA GGC TCT AGA GGC GTT ATA CGT GTT GAT GGA CGA CCA GTG TAC CTC GCG 

°241 Glu val Ser He Gly Ser Arg Gly val He Arg Val Asp Gly Arg Pro val Tyr Leu Ala 

0 

781 GTT AAG AGG CAT GTG AGO GGC ACA AGT ATG TAC GCG TAT ACG GAT CTT GTC GTG TTT AGG 
261 Val Lys Arg His Val Arg Gly Thr Ser Met Tyr Ala Tyr Thr Asp Leu Val Val Phe Arg 
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84 1 AGA GGC GAG AAA CTC ATA GTG AGA AGC GGT GAG AG 7 ATA GAC CTA GAG 7TT AAC GAC C73 - • t 

900 

281 Arg Gly Glu Lys Leu lie Val Arg Ser Gly Glu Ser lie Asp- Leu Glu Phe Asn Asp Leu 

" ' 300 

901 GTG TTG TTC GAC AAC CAC ATA ' CTA TAC CTA TTT ATC CTT CCG CAAACCCCC 9S1 

301 Val Leu Phe Asp Asn Hi5 i lie Leu Tyr Val Phe lie Leu Pro Glu Arg Pro 117 
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Thermococcus celer (25ph2) 



SEQ ID NO:45-nuclcic acid 
SEQ ID NO:46*amino acid 



L ATG GAC ATC AGG GCC GTT GTT TTT GAC CTC CAC GGG ACG CTT GTG GGT GCT GAG AAG ACT 

60 

1 Met Asp lie Arg Ala Val Val Phe Asp Leu Asp Gly Thr Leu Val Gly Ala Glu Lys Thr 

2 0 — ■ - : ■ : . , . . ■ 



61 TTC AGC GAG ATA AAG TCC GAG CTT AAA GAA CGG CTG ATT TCC TTA GGG ATT CCC AGG GAG 

120 

21 Phe Ser Glu He Lys Ser Glu Leu Lys Glu Arg Leu He Ser Leu Gly He Pro Arg Glu 

40 

- 121 CTC GTT GGA GAG CTA ACG CCG ATG TAT GAG GGC CTT ATC GAG CTG TCC AG A AAA ACG GGC 
180 

41 Leu Val Gly Glu Leu Thr Pro Met Tyr Glu Gly Leu He Glu Leu Ser Arg Lys Thr Gly 

60 

181 AG A CCT TTC GAA GAG ATG TAC TCA ATT CTC CTC AAT CTT GAA GTT GAA AGG ATA AGG GAC 

240 

61 Arg Pro Phe Glu Glu Met Tyr Ser He Leu Val Asn Leu Glu Val Glu Arg He Arg Asp 

80 

24 1 AGC TTT CTC TTC GAG GGG GCA AGG GAG CTC CTC GAC TTT CTT GTG GGG GAG GGA ATA AAG 

300 

81 -Ser Phe Leu Phe Glu Gly Ala Arg' Glu Leu Leu Asp Phe Leu Val Gly Glu Gly He Lys 

100 

301 CTT GCC CTC ATG ACC CGG AGC TCC AGA ATG GCT GCC CTT GAG GCC CTG GAG CTT CAC GGC 

360 

101 Leu Ala Leu Met Thr Arg Ser Ser Arg Met Ala Ala Leu Glu Ala Leu Glu Leu His Gly 

120 

361 ATT AAG GAC TAC TTT GAG ATT ATT TCA ACG AGG GAT GAT CTC CCT CCC GAG GAG CTG AAA 

423 

121 He Lys Asp Tyr Phe Glu He He Ser Thr Arg Asp Asp Val Pro Pro Glu Glu Leu Lys 

140 

421 CCG AAT CCT GGC CAG CTG AGG AGA ATC CTC GGT GAG CTC AAC GTT CAA CCA GAG AAA GCC 

480 

141 Pro Asn Pro Gly Gin Leu Arg Arg He Leu Gly Glu Leu Asn Val Gin Pro Glu Lys Ala 

160 

4 81 ATC GTC GTT GGA GAC CAC GGC TAC GAT GTC ATC CCT GCC CGG GAG CTC GGC GCT CTG AGC 

S40 

161 He Val Val Gly Asp His Gly Tyr Asp Val lie Pro Ala Arg Glu Leu Gly Ala Leu Ser 

180 

5-11 GTC CTT GTC ACC GGC CAC GAG GCT GGC AGA ATG AGC TTT CAG GTT GAA GCC GAG CCA AAC" 

600 

181 Val Leu Val Thr Gly His Glu Ala Gly Arg Met Ser Phe Gin Val Glu Ala Glu Pro Asn 

200 

601 TTT GAG GTC GAG. AAC CTC ATT CAC CTC AGG AAG CTC TTC GAG AGG CTC CTG TCG AGC TAC 

660 - - - - 

201 Phe Glu Val Glu Asn Leu He Hia Leu Arg Lya Leu Phe Glu Arg Leu Leu Ser Ser Tyr 

220 

661 GTT GTT GTT CCC GCT TAC AAC GAG GAG AAG ACC ATC AAG GGG GTA ATA GAG AAT CTT CTC ' 

720 

221 Val Val Val Pro Ala Tyr Asn Glu Glu Lys Thr He Lys Gly Val He Glu Asn Leu Leu 

240 

721 AGG TAT TTC AAA AAG GAC GAG ATA ATC GTC GTG AAC GAC GGC TCC AGG GAT AGA ACG GAG 

780 

241 Arg Tyr Phe Lys Lys Asp Glu He He Val Val Asn Asp Gly Ser Arg Asp Arg Thr Glu 

260 

781 GAG ATA GCT CGT TCT TAC GGA GTC CAC GTT CTT ACC CAT CTC GTC AAC AGG GGG CTT GGT 

840 

261 Glu He Ala Arg Ser Tyr Gly Val Hi3 Val Leu Thr His Leu Val Asn Arg Gly Leu Gly. 

280 
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34 1 GGG GCC CTC GGA ACG GGC TTT CCC TAT GCC ATC ACA AAA AAC GCC AAA CTT CTC'- CTC ACA ■ 

900 ' - ' . ' 

281 1 Gly Ala, Leu Gly Thr Gly Phe Ala Tyr Ala He Arg Lys Asn Ala Lys Leu Val Leu Thr. 

3 00 . . . . 

901 TTT GAT GCC GAC GGC CAG CAC CTT ATA ' AGC GAC GCC CTC CGC CTC ATG AGG CCA GTT CCC 

960 

301 Phe Asp Ala Asp Gly" Gin His Leu He SerAsp Ala Leu , Arg Val Met Arg Pro Val' Aia 

320 

961 GAG GGC AGG GCG GAC TTT GCG GTC GGC rCAAGC CTC AAA GGT GAC ACG 1 AGC CAG ATG CCC 
1020 

321 Glu Gly Arg Ala Asp Phe Ala Val Gly Ser Arg Leu Lys Gly Asp Thr Ser Gin Mec Fro 
3 40 , * ■ • . 

1021 CTC GTG AAG AAG TTC GGC AAC TTC'GTT CTA GAT. GCC GTG ACC GCG GTT TTT GCT GGT AAA 

10 80 . 

' 34 1 Leu 'Val Lys Lys Phe Gly Asn Phe Val Leu Asp Ala Val Thr Ala Val Phe Ala Gly Lys 

" 360 r , , * • 

1081 TAC GTC AGC GAC AGT CAG AGC GGG TTA AGG TGT CTA AGC CGC GAC TGC CTG AGG AAA 1 ATC 
1140 ' , 

J 361 Tyr Val Ser Asp Ser Gin Ser Gly Leu Arg Cys Leu Ser Gly Asp Cys Leu Arg Lys lie 
380 ' 

1141 AGG ATA ACC TGC- GAC CGC TAT GCC GTG TCG AGT GAG ATT ATA ATA GAG GCC TCC AAA GCG 
1200 ' , 

3 81 Arg lie Thr Cys Asp Arg Tyr Aia Val Ser Ser Glu lie. lie He Glu Ala Ser Lys Ala 
400 : ' , 

12 01 ' GGC TGT AGA ATT GTC GAA GTT CCT, ATC . AAG GCT GTT TAC ACT GAG TAC TTT ATG AAG AAG 

1260 ' . 

401 Gly Cys Arg He Val Glu Val' Pro lie Lys Ala Val Tvr Thr Glu Tyr Phe Met Lys Lys 

420 ' . ■ ' " _ 

12fil GGG ACG AAC GTT TTA GAG GGC GTT AAG ATA CCC CTG AAC CTT CTC TTT GAC AAA CTG AGC ' 
1320 ; . ( ' . : , . 

421 Gly Thr Asn Val Leu Glu Gly Val Lys He Ala Leu Asn -Leu,* Leu Phe Asp Lys Leu Arg : 

.440 < • ' ' , ' 
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5EQ ID NO: 47 and 4 8 



Aquifex pyrophilus {28phl> 



60 
20 

120 



61 



-2-1— ' 



ATG GAA AAT CTT GAA AAA CTC CTT GAA GTG GCA AAG ATG GCA CCC CTT GCC GGA GGA CAG 
Met Glu Asn Leu Glu Lys Leu Leu Glu Val Ala Lys Met Ala Ala 'Leu Ala Gly Gly Gin 

GTA TTA AAG GAA AAC TTC GGA AAG ATT AAG CTT GAA AAC ATT GAA GAA AAG GGA GAG AAG 

■va.i-tfAii-r. ya ^niu-Asn^aha-:Gly_Lvs.Ile L ys Leu Glu Asn He Glu Glu Lys Gly Glu Lys 



180 



60 



80 



360 



120 



160 



540 



600 



660 



220 



720 



780 



260 



121 GAC TTC' GTG AGC TAC CTT GAT AAA ACC TCC GAA GAG AGA ATA AAA GAG CTA ATA CTT AAG 
41 Asp Phe Val Ser Tyr Val Asp Lys Thr Ser Glu Glu Arg lie Lys Glu Leu He Leu Lys 

181 TTC TTT CCC GAC CAC GAG GTC GTG GGG GAG GAA AGG GGA AAG GAG GGA AAA GAA AGC CCT 
' 61 Phe Phe Pro Asp His Glu Val Val Gly Glu Glu Arg Gly Lys Glu Gly Lys Glu Ser Pro 

241 TAC AAA TGG TTC ATA GAC CCC CTT GAT GGG ACC AAG AAC TAC ATA AAG GGC TTT CCC ATA 
° 81 Tyr Lys Trp Phe He Asp Pro Leu Asp Gly Thr Lys Asn Tyr He Lys Gly Phe Pro lie 

'3 01 TTT GCA GTC TCC GTC GGA CTC CTT AAG GAA AAC GAA CCT ATA GTG GGA GCG CTT TAC CTT 
101 Phe Ala Val Ser Val Gly Leu Val Lys Glu Asn Glu Pro He Val Gly Ala. Val Tyr Leu 

CCT TAC TTT GAT ACC CTA TAC TGG GCT TCA AAG GGA AGG GGA GCC TAT AAA AAC GGG GAG 
Pro Tyr Phe Asp Thr Leu Tyr Trp Ala Ser Lya Gly Arg Gly Ala Tyr Lys Asn Gly Glu U 

AGG ATA AGC GTA AAG GAA AGG GGG GAG CTC AAG CAC GCG GCG GTT GTT TAC GGA TTT CCA 
Arg lie Ser Val Lys Glu Arg Gly Glu Leu Lys His Ala Ala Val Val Tyr Gly Phe Pro 

481 TCA AGA AGC AGG AGG GAT ATA TCT CTT TAC CTG AAT GTG TTT AAA GAG GTC TTT TAC GAA 
Ser Arg Ser Arg Arg Asp He Ser Leu Tyr Leu Asn Val Phe Lys Glu Val Phe Tyr Glu 

GTA GGT TCC GTT AGG AGG CCC GGG GCC GCA GCG GTT GAT ATA TCC ATG CTT GCG GAG GGC 

181 Val Gly Ser Val Arg Arg Pro Gly Ala Ala Ala Val Asp He Cys Met Leu Ala Glu Gly 

0 

601 ATA TTT GAC GGG ATG ATG GAG TTT GAG ATG AAG CCA TGG GAC ATA ACC GCG GGA CTC GTA 

201 lie Phe Asp Gly Met Met Glu Phe Glu Met Lys Pro Trp Asp He Thr Ala Gly Leu val 

661 - ATA CTG AAG GAA GCT GGA GGA TTT TAC ACA CTG AAG GGA GAC CCC TTC GGC ATC TCG j*C 
221 lie Leu Lys Glu Ala Gly Gly Phe Tyr Thr Leu Lys Gly Asp Pro Phe Gly He Ser Asp 

o 

721 ATA ATA GCG GGA AAC AGG ATG CTC CAC GAC TTC ATT CTC AAG GTT GTG AAT AAA TAC ATG 
lie He Ala Gly Asn Arg Met Leu His Asp Phe He Leu Lys Val Val Asn Lys Tyr Met 

AAT AAT GAA AGC ACG 795 
Asn Asn Glu Ser Thr 265 



781 
261 
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Bacillus thermoleovorans (ssfys) 
•SEQ ID NO: 49 and 50 • / 



1 ATG ACT GAA CAC CCG GTA TTG TCT GTT CAA GGA TTA AGC ^ GGC 'GGG TAT AGC ATG AAC CGA 
60 . '__■'.*■, - 

1 Met Ser Glu Gin Pro Val Leu Ser Val Gin Gly. Leu Ser Gly Gly Tyr Ser Met Asn Arg 

20 • > 

fii CCG, GTT CTG CAT GAC GTA ACC TTT CAG GTT GAA CCG GGT GAG ATG GTG GGT TTC ATC GGC 

120 

21 Pro Val Leu His Asp- Val Thr Phe Gin Val Glu Pro Gly Glu Met Val Cly Leu He Gly 

40 . ■ 

121. CTG AAC GGT CCG GGC AAG ACT , ACC ACG ATG AAG CAT ATT CTC GGG ■ CTG ATG AAT CCG CAA 
180 , . 

41 Leu Asn .Gly Ala Gly Lys Ser Thr Thr Mec Lys His lie Leu Gly Leu Met Asn Pro Gin 

• 60- ; . ,• . . - , 

181 AAA GGG AGC ATT CAG GTT CAA GGA AAG AGC CCG ACA GAG CAT TCG ' GAA GCC TAT CAC GGC 
24 0 ' ■ : 

61 Lys Gly. Ser lie Gin Val Gin Gly Lys Ser Arg Thr Glu His Ser Glu Ala Tyr His Gly 

80 ■ * 

241 GCC TTG GCG TTT GTT CCC GAA TCC CCG CTG CTG TAT GAG GAG' ATG ACA GTA CGA GAG CAT 

300 

81 Ala Leu Ala Phe Val Pro Glu Ser ,Pro Leu Leu Tyr Glu Glu Met Thr Val 'Arg Glu His 
, 100 ' * 

301 CTG GAA TTT ACG GCG CGC TCC TAT GGC GTA TCC CGT GAA GAT TAT GAG GCA CGT TCG GAG 
360 . , r _ 

101 Leu Glu Phe Thr Ala Arg Ser Tyr Gly Val Ser Arg Glu Asp Tyr Glu Ala" Arg Ser Glu 
■ 120- ' 

361 CAG CTG TCG AAG ATG TTC CGT ATG GAA GAG AAG ATG GAC AGC CTG TCC ACG CAT TTG TCC 
420 ' 1 

121 Gin Leu Ser Lys Met Phe Arg Met Glu Glu-Lys Met Asp Ser Leu Ser Thr His Leu Ser 
140 • . . ' . 

4 21 AAA GGG ATG CGC CAA AAA GTG ATG ATC ATG TGC GCA TTC GTA GCC AGA CCG TCC CTG TAC 
4 80 1 

141 Lys. Gly Met Arg Gin Lys Val Met He Met Cys Ala Phe Val Ala Arg Pro Ser Leu Tyr 

160 ' ■ 

■481 . ATC .ATT GAC GAG CCC TTT CTT GGG CTT ' GAT CCG CTT GGG ATA CGC TCG CTG CTT GAC TTC 

S40 • - " . . * ■ . ■ [ \ 

161 He lie Asp Glu Pro Phe Leu Gly Leu Asp Pro Leu Gly He Arg Ser Leu Leu Asp Phe ,, 



180 



541 t ATG CTG GAG CTG AAG GCA TCC GGC GCT TCG GTA TTG CTA AGC TCC CAC ATT 5 91 
18l\ Met Leu Glu Leu- Lys Ala Ser Gly Ala Ser v Val Leu Leu Ser Ser His lie 197 
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Pyrococcus furiosus VC1 (7phi> 
SCO ID N0:5I and 52 



1 ATG AAG AAA ATA ACT ATT AGT ACT TTG CTT CTA CTT TTA CTT ATT TCT ACC AAT TTG AAT 

60 

1 Met Lys Lys He Thr He Ser Ser Leu Leu Leu Leu Leu Leu He Ser Thr, Asn Leu Asn 

20 

61 CTC GCA TAC GAT TCC CAA GAG AGC GGT ATT AAA AAT ATA ATA ATC CTC ATT GGA GAC GGC 

U0 21 Leu Ala Tyr Asp Ser Gin Glu Ser Gly lie Lys Asn Xle lie lie Leu lie Gly Asp Gly 
40 



121 ATG GGA ATG AGT CAT GTC CAG ATT ACA AAG CTT GTT TAT GGT CAT CTA AAC ATG GAA GAG 
180 

4 1 Met Gly Met Ser His Val Gin lie Thr Lys Leu Val Tyr Gly His Leu Asn Met Glu Glu 

60 

' 191 TTC CCA ATT ATT GGA TTC GAA CTT ACT GAG TCA TTA AGT GGG GAA GTT ACG GAC TCC GCT 
240 

61 Phe Pro He He Gly Phe Glu Leu Thr Glu Ser Leu Ser Gly Glu Val Thr Asp Ser Ala 

80 

24 1 GCA GCA GGA ACT GCA ATA GCA ACT GGA GTC AAA ACA TAT AAT CGA ATG ATT TCA GTT ACT 

300 81 Ala Ala Gly Thr Ala He Ala Thr Cly Val Lys Thr Tyr Asn Arg Met He Ser Val Thr 
100 

301 AAC ATA ACT GGA AAA GTT ACA AAT CTA ACT ACC TTG CTT GAA ATA GCC CAG GTA CTT GGA 

36 °101 Asn He Thr Gly Lys Val Thr Asn Leu Thr Thr Leu Leu Glu lie Ala Gin Val Leu Gly 
120 

361 AAA TCA ACT GGA CTT GTG ACT ACT ACT AGA ATT ACA CAC GCA ACC CCT GCA GTA TTT GCT 

42 °121 Lys Ser Thr Gly Leu Val Thr Thr Thr Arg lie Thr His Ala Thr Pro Ala Val Phe Ala 
140 :£ ' 

421 TCC CAC GTT CCT GAC AGA GAT ATG GAA GAG GAA ATA GCG AGA CAG CTC ATA GCT CAC CGG,, 

48 °141 Ser His Val Pro Asp Arg Asp Met Glu Glu Glu He Ala Arg Gin Leu He Ala His Arg 

160 

4B1 GTC AAC GTC CTA TTA GGT GGA GGG AGA AAG AAA TTT GAC GAG AAT ACC CTA AAA ATG GCA ; , ^ 

S4 °161 Val Asn Val Leu Leu Giy Gly Gly Arg Lys Lys Phe Asp Glu Asn Thr Leu Lys Met Ala 
180 

541 AAA GAA CAG GGA TAT AAT ATA GTC TTC ACG AAA GAA GAG CTC GAG AAA GCA GAG GGT GAG^ 

lfli Lys- Glu Gin Gly Tyr Asn He Val Phe Thr Lys Glu Glu Leu Glu Lys Ala Glu Gly Glu 

200 

601 TTT ATT CTA GGG CTT TTT GCA GAT AGC CAC ATT CCT TAC GTA TTG GAC AGA AAA CCA GAA " 

201 Phe lie Leu Gly Leu Phe Ala Asp Ser His He Pro Tyr Val Leu Asp Arg Lys Pro Glu 

220 

661 GAT GTT GGA CTT TTG GAA ATG ACT AAA AAA GCA ATT TCA ATA CTA GAG AAA AAT CCA AAT 

72 °221 Asp Val Gly-Leu Leu .Glu Met. Thr Lys . Lys, Ala He Ser UeJLeu Glu Lys Asn Pco A*n 
240 

721 GGG TTC TTT CTC ATG ATT GAA GGG GGC AGA ATT GAT CAT GCA GCT CAT GAG AAT GAT ATA 

79 °241 Gly Phe Phe Leu Met He. Glu Gly Gly Arg He Asp His Ala Ala His Glu Asn Asp He 
260 

781 GCA TCA GTT GTT GCA GAG ACT AAG GAG TTT GAT GAC GTT GTT GGA TAT GTT CTT GAG TAT 

84 °261 Ala Ser Val Val Ala Glu Thr Lya Glu Phe Asp Asp Val Val Gly Tyr Val Leu Glu Tyr 
280 

841 GCA AAA AAG AGG GGA GAT ACA CTA GTA ATA GTG CTG GCT GAC CAT GAG ACA GGG GGG CTT 
281 Ala Lys Lys Arg Gly Asp Thr Leu Val He Val Leu Ala Asp His Glu Thr Gly Gly Leu 

300 
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960 

301 

320 

961 
1 1020 

321 

340 

1021 
1080 
341 

360 

loei 

1140 ■ 
361 

3B0 

. 1141 
1200 
381 

400 

1201 
1260 
401 

420 

1261 
1320 ' 
421 

440 

1321 
13 80 
441 

4 60 

1381 
1440 
461 

480 

1441 
481 



GGA TTA GCT CTA ACA TAT GGA CAT CCA ATT AAT GAA GAT GTC ATC ACG AAC ATA AAC GCT 
Gly Leu Gly Leu Thr Tyr Gly Asp Ala lie Asn Glu Asp Val lie Arg Asn lie Asn Ala 

AGT GTG TCG AAA ATT GCT ACT GAA ATA AGG GCA ACG AAT GAC ATA AAG AGA CTT ATC 'AAA 
Ser Val Ser Lys lie Ala Ser Glu lie Arg Ala Thr Asn Asp lie Lys Arg Val He Lys 

AAA TAT ACT GGA TTC GAG CTA AC-. GAG GAC GAA ATT AAT TAC ATT GAG GAA GCT *ATA AAC 
Lys Tyr Thr Gly Phe Glu Leu Thr Glu Asp Glu He Asn Tyr He Glu Glu Ala He Asn 

TTA GCA GAC GAA TAT GCG CTT CAA AAT GCA ATA GCT GAT ATT ATA AAC AAA CGC GTT GGT 
Leu Ala Asp Glu Tyr Ala Leu Gin Asn Ala He Ala Asp lie He Asn Lys Arg Val Gly 

GTA GGT TTT ' GTA TCC CAC AAA CAT ACA GGA GCT CCT GTT TCA CTT CTA GCC TAC GGC CCA 
Val Gly Phe Val Ser His Lys His Thr Gly Ala Pro Val Ser Leu Leu Ala Tyr Gly Pro. 

GGT GCA GAG AAT TTT GCA GGC TTT TTA CAC CAT GTA GAT ACG GCA AAG CTA ATT GCC AAG 
Gly Ala Glu Asn Phe Ala Gly Phe Leu His His Val Asp Thr Ala Lys Leu He Ala Lys 

CTA ATG CTC TTT GGG AAG AAA GAT ATT CCC GTT ACC ATC TTG GGA ATA AGT GGA GTT AAA 
Leu Mec Leu Phe Gly Lys Lys Asp lie Pro Val Thr He Leu, Gly He Ser Gly Val Lys 

GGA GAT ATA ACC GCA GAC TTC AAA GTG GAT GAG CAA. GAT GCA TAT GTG ACC TTA ATG ATG 
Gly Asp lie Thr, Gly Asp Phe Lys Val Asp Glu Gin Asp Ala Tyr Val Thr Leu Mec Met 

TTG CTT GGG GAA AGG GTA GAT ACT GAA CTT GAA ACG AAA GTC GAC ATG AAT AAT AAC GGC 
Leu Leu. Gly Glu Arg Val Asp Thr Glu Leu Glu Arg Lys Val Asp Mec Asn Asn Asn. Gly 

ATA ATC GAG TTG GGA GAC GTG CTC CTG ATT CTA CAA GAG TCC 14 82 
He Ho Glu Leu Gly Asp Val Leu Leu He Leu Gin Glu Ser 494 
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Pyrococcus furiosus VC1 (7ph2) 

SEQ ID NO: S3 and 54 



1 ATG ATT AAC CAA ATA AAC TTC AAA ACC TCT CAT GGA GGA AGC AG A GAA GAA GGC TAC ATA 

60 

1 Met lie' Asn Gin lie Asn Phe Lys Thr Ser His Gly Gly Ser Arg Glu Glu Gly Tyr He 

20 

61 AAC TTC TCG GCC TCT GTA AAT CCT TAT CCA CCA GAA TGG ACT GAT GAA ATG TTT GAG AGG 

120 

21 Asn Phe Ser Ala Ser Val Asn Pro Tyr Pro Pro Glu Trp Thr Asp Glu Met Phe Glu Arg 

4 0 



121 GCT AAA AAG ATA AGC ACC TTC TAT CCT TAC TAT GAA AAG CTT GAG GAA GAA CTC TCA^CAT 

130 

4 1 Ala Lys Lys Zle Ser Thr Phe Tyr Pro Tyr Tyr Glu Lys Leu Glu Glu Glu Leu Ser Asp 

60 

181 CTA ATT GGG GAG CCA ATA ACT ATA ACT GCA GGA ATA ACA GAG GCA CTT TAC CTG CTT GGA 

240 

61 Leu lie Gly Glu Pro He Thr He Thr Ala Gly lie Thr Glu Ala Leu Tyr Leu Leu Gly 

80 

24 1 GTT TGG ATG AGG GGT CGG AAA GTA ATA ATC CCG AAG CAC ACC TAT GGG GAA TAC GAG AGG 

300 

81 Val Trp Met Arg Gly Arg Lys Val He He Pro Lys His Thr Tyr Gly Glu Tyr Glu' Arg 

100 

301 ATC TCA CCC ATG TTC GGA GGT AGG GTG ATC AAA GGT CCC AAT GAC CCA GCA AAG TTA GCA 

360 

101 He Ser Arg Met Phe Gly Gly Arg Val He Lys Gly Pro Asn Asp Pro Gly Lys Leu Ala 

120 

361 GAA TTT GTT GAA AG A AAT TCA TTC GTG TTC TTC TGC AAT CCA AAC AAT CCA~ GAT GGA AAG 

420 

121 Glu Phe Val Glu Arg Asn Ser Phe Val Phe Phe Cys Asn Pro Asn Asn Pro Asp Gly Lys 

140 

421 TTC TAC CGA GAA AAA GAG ATG AAA CCT CTT TTA GAT GCC ATT CAA GAC ACT AAC, TCA ATT 

480 

141 Phe Tyr Arg Glu Lys Glu Met Lys Pro Leu Leu Asp Ala He Gin Asp Thr Asn Ser He 

160 

4 81 TTG ATC TTG GAT GAA GCC VTw* ATA GAC TTT GTT AAG AAA CCA GAA AGC CCA GAG GGA GAG 

540 

LSI Leu He Leu Asp Glu Ala Phe He Asp Phe val Lys Ly3 Pro Glu Ser Pro Glu Gly Glu 

180 

541 AAC ATA ATC AGG CTA AGG ACT TTT ACC AAA AGC TAC GGG CTC CCA GGG GTA AGG GTT GGA 

600 

181 Asn He He Arg Leu Arg Thr Phe Thr Lys Ser Tyr Gly Leu Pro Gly Val Arg Val Gly 

200 

601 TAT GTT ATT GGA TTT GTC GAT GCT TTC AGG AGC GTT AGA ATG CCA TGG TCA ATT GGC TCT 

660 

201 Tyr Val He Gly Phe Val Asp Ala Phe Arg Ser Val Arg Met Pro Trp Ser He Gly Ser 

220 

661 ACT GGG GTG GCC TTC TTA GAG TTC TTA CTC AAA GAT AAC TTC AAA CAC TTA AGA AAA ACC 

720 

-- -221 -Thr-Gly .Val. Ala.. Phe Leu Glu Phe Leu Leu Lys Asp Asn Phe Lys His Leu Arg Lys Thr 
240 

721 CTC CCC CTA ATA TGG AAA GAA AAG GAG AGG ATT GAG AAA GAA TTG AAA GTT AAA AGC GAT 

780 

241 Leu Pro Leu lie Trp Lys Glu Lys Glu Arg He Glu Lys Glu Leu Lys Val Lys Ser Asp 

260 

781 GCA AAT' TTC TTC ATT ATG AAG GTC AGA GAA GGA ATA ATT GAA AAG CTA AAA GAG AAT GGC 
840 . , 

261 Ala Asn Phe Phe He Met Lys Val Arg Glu Gly He He Glu Lys Leu Lys Glu Asn Gly 

280 

841 ATC CTT GTA AGG GAT TGC AAG AGC TTT GGA CTC CCT GGG TAC ATA AGG TTT TCA GTT AGA 

900 

281 He Leu Val Arg Asp Cys Lys Ser Phe Gly Leu Pro Gly Tyr He Arg Phe Ser Val Arg 

300 
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901 AGG AG A GAA GAG AAT GAC AAA CTC ATA AAC ATC CTT AG A AAA ACA CTT AAT ACT 9 54 
301 Arg Arg Glu Glu Asn Asp Lys Leu lie Asn He Leu Arg Lys Thr Leu Asn Thr 3 16 
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What Is Claimed Is: 

1. An isolated polynucleotide selected from the. group . 

consist ing, of : 

(a) , a polynucleotide encoding an enzyme 
comprising* an amino acid sequence selected from the group 
ofHimi-no-aci-d-s-eq^ — 

(b) a polynucleotide which is complementary to 
the polynucleotide of (a) ; and 

■'(c)- a polynucleotide comprising at least 15 
bases of the polynucleotide ,of (a) or (b) . 

2. An isolated polynucleotide selected from the group 

consisting of : 

(a) SEQ ID NOS: 19-27,. 37-41, 43, 45, 47, 49, 51, 
or 53; 

(b) SEQ ID NOS : 19-27 , 37-41, 43, 45, 47, 49, 51, 
or, 53, where T can also be TJ; and 

(c) fragments of a) or b)that are at least 15 

■ bases in length and that will hybridize to 
DNA which encodes the amino acid sequence of 
any of SEQ ID Nos:28-36, 42, 44, 46, 48, 50, 
52, or 54 . 

3. The polynucleotide of Claim 1 wherein. the 
polynucleotide is DNA. 

4. The polynucleotide of Claim 1 wherein the 
polynucleotide is RNA. 
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5.. An ; isolated polynucleotide • comprising, a*. 

-polynucleotide having at least 70% identity to a member . * 
selected from the group consisting of: 1 

(a) a polynucleotide • encoding an enzyme t encoded 
by' the DNA contained in ATCC Deposit No..; 97379, wherein 
said enzyme is selected from the group consisting of 
'Ammonifex degensii KC4 , Aquif ex VF-5 , 1 M11TL> Methanococcus 
igneus KOL5, Thermococcus AED112RA, and Thermococcus : celer , 
■Thermococcus CL-2, and 'Thermococcus GU5L5 . 

■ (b) a polynucleotide complementary to the - ; 
polynucleotide of (a) ; and • . < . 

(c) a polynucleotide comprising at least 15 
bases of the- polynucleotide of (a) and lb) . 

6. A' vector comprising the DNA of Claim 1 or Claim 

2 V , ' ' . . 

. .(■ > 

7 . ' ' ' ' ■ A host cell comprising the vector of Claim 6 : 

8. ; a process for producing a polypeptide comprising: 
expressing from the host cell of Claim 7 a polypeptide . 
encoded by said DNA and isolating, the polypeptide .. 

9. '. A process for producing a recombinant cell * 
comprising:- transforming or transfecting the cell with the 

.vector of ' Claim 6 such that the., cell expresses the 
polypeptide encoded by the DNA contained in the. vector.' 
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• ".An enzyme of" which, at least a portion is coded 
for by a polynucleotide of claim 1, and which is selected 
from the- aroup consisting of: 

. ' an enzyme comprising an amino acid sequence 

which is at least 70% identical to an amino acid sequence 
selected from the group of amino acid sequences set forth 
in 'SEQ ID NOS: 23 -36, -and — r ■ ' 

(b) an enzyme which comprises at least 3 0 amino 

acid residues to the enzyme of (a) . 



11. 



An enzyme of which at least a portion is coded 
for by a polynucleotide of claim. 1, and which is selected 
from the group consisting of: 

(a) an enzyme comprising an amino acid sequence 
selected from the group of amino acid sequences set forth 
in in SEQ ID NOS:28-36, 42, ' 44, 46, 48, SO, 52, or .-54; and 

(b) an enzyme which comprises at least 3 0 amino 
acid residues to the enzyme of (a) . 

12. A method for hydrolyzing phosphate bonds 

comprising : 

administering an' effective amount of an enyzme 
selected -.f rom the group consisting of an enzyme having the 
amino acid sequence selected from the group of amino acid 
sequences set forth in SEQ ID NOS: 28-36,- 42, 44, 46, 48, 
50, 52, or 54. 
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FIGURE 1 



Aamonifex degensii KC4 Phosphatase (3A1A-3A2A) 
Complete gene sequence 



ATGAGGGGGAGCGGAGTGCGGATACTTCTCACCAACGATGACGGCATCT^CCGAGGGT 
1 MetArgGlySerGlyValArglleLeuLeuThrAsnAspAspGlyllePheAlaGluGly 

CTGGGGGCTCTGCGCAAGATGCTGCAGCCCGTGGCTACCCTTTACGTGGTGGCTCCGGAC 
21 LeuGlyAlaLeuArgLysMetLeuGluProValAlaThrLeuTyrValValAlaProAsp 

CGAGAGCGTAGCGCGGCCAGCCATGCTATCACCGTTCACCGCCCCCTGCGGGTGCGGGAG 
41 ArgGluArgSerAlaAlaSerHisAlalleThrValHisArgProLeuArgValArgGlu 

GCGGGTTTTCGCAGCCCCAGGCTTAAAGGCTGGGTAGTCGAG^ 
61 AlaGlyPheArgSerProArgLeuLysGlyTrpValValAspGlyThrProAlaAspCys 

GTCAAGCTGGGCCTGGAGGTACTTTTGCCCGAACGTCCAGATTC 
81 ValLysLeuGlyLeuGluValLeuLeuProGluArgProAspPheLeuValSerGlylle 

AACTACGGGCCCAACCTGGGTAC^GACGTACTTTACTCCGGCACCGTCTCGGCGGCCATA 
101 AsnTyrGlyProAsnLeuGlyThrAspValLeuTyrSerGlyThrValSerAlaAlalle 

GAAGGGGTAATTAACGGCATTCCCTCGGTGGCCGTATCTTTGGCCAGGCGTC 
121 GluGlyVallleAsnGlylleProSerValAlaValSerLeuAlaThrArgArgGluPro 

GACTATACCTGGGCGGCGCGGTTCGTCCTGGTCCTGCTGGAGGAACTGCGAAAACACCAA 
'141 AspTyrThrTrpAlaAlaArgPheValLeuValLeuLeuGluGluLeuArgLysHisGln 

CTGCCCCCAGGAACCCTGCTCAACGTCAACGTGCCCGACGGGGTGCCCCGCGGGGTCAAG 
161 LeuProProGlyThrLeuLeuAsnValAsnValProAspGlyValProArgGlyValLys 

GTGACCAAACTGGGAAGCGTACGCTACGTCAACGTGGTAGACTGCCGCACCGACCCTCGG 
181 ValThrLysLeuGlySerValArgTyrValAsnValValAspCysArgThrAspProArg 

GGGAAGGCTTACTACTGGATGGCGGGAGAACCATTGGAGCTGGACGGCAACGACTCCGAA 
2 01 GlyLysAlaTyrTyrTrpMetAlaGlyGluProLe^ 

ACCGACGTCTGGGCGGTGCGAGAAGGCTATATTTCCGTAACACCGGTCCAGATCGACCTT 
221 ThrAspValTrpAlaValArgGluGlyTyrlleSerValThrProValGlnlleAspLeu 

ACTAACTACGGCTTCCTGGAAGAACTCAAAAAATGGCGTTTCAAGGATATCTTTTCTTCT 
241 ThrAsnTyrGlyPheLeuGluGluLeuLysLysTrpArgPheLysAspIlePheSerSer 

TAA 

261 End 261 
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FIGURE. 2 



Methanococcus igneus KolS Phosphatase (9A1A) 
Complete Gene Sequence 



ATGTTGGATATACTGCTTGTTAATGATGATGGCATTTATTCAAATGGATTAATAGCTTTG 
1 MetLeuAspIleLeuLeuValAsnAspAspGlylleTyrSerAsnGlyLeuIleAlaLeu 

AAGGATGCATTATTGGAAAAATTTAATGCGAGGATTACTATTGTAGCCCCAACAAATCAG 
2i LysAspAlaLeuLeuGluLysPheAsnAlaArglleThrlleValAlaProThrAsnGln 

CAGAGTGGTATTGGTAGGGCAATAAGTTTATTGGAGCCGTTAAGGATAACTAAAACCAAA 
41 Gi'nSerGlylleGlyArgAlalleSerLeuPheGluProLeuArglleThrLysThrLys 

TTAGCAGATGGTTCTTGGGGATATGCAGTTTCAGGAACCCCAACAGATTGCGTTATATTG 
61 LeuAlaAspGlySerTrpGlyTyrAlaValSerGl'yThrProThrAspCysVallleLeu 

GGCATTTATGAGATATTAAAGAAGGTACCTGATGTAGTTATATCAGGAATAAACATTGGA 
61 Gly'IleTyrGluIleLeuLysLysValProAspValVallleSerGlylleAsnlleGly 

GAAAACCTTGGGACTGAAATAACAACTTCTGGAACGTTGGGGGCTGCGTTTGAAGGGGCC 
101 GluAsnLeuGlyThrGluIleThrThrSerGlyThrLeuGlyAlaAlaPheGluGlyAla 

CATCATGGGGCTAAGGCATTAGCATCATCACTCCAAGTTACCTCTGACCATCTAAAGTTT 
121 HisHisGlyAlaLysAlaLeuAlaSerSerLeuGlnValThrSerAspHisLeuLysPhe 

AAAGAGGGGGAGACCCCAATAGACTTCACAGTCCCAGCAAGAATTACTGCAAATGTTGTT 
141 LyS GluGlyGluThrProIleAspPheThrValProAlaArgIleThrAlaAsnValVal 

GAGAAGATGTTGGATTATGATTTCCCATGTGATGTCGTCAACTTAAACATTCCAGAAGGA 
161 GluLysMetLeuAspTyrAspPheProCysAspValValAsnLeuAsnlleProGluGly 

- ' --GCAACAGAAAAGACACCGATTGAAATCACAAGGTTGGCAAG^ . 
181 AlaThrGluLysThrProIleGluIleThrArgLeuAlaArgLysMetTyrThrThrHis 

GTTGAGGAAAGAATAGATCCAAGAGGGAGGAGTTATTATTGGATTGATGGGTATCCTATT 
201 ValGluGluArglleAspProArgGlyArgSerTyrTyrTrpIleAspGlyTyrProIle 

TTAGAGGAAGAGGAAGACACTGATGTCTATGTTGTTAGAAGAAAGGGACATATTTCTCTA 
221 LeuGluGluGluGluAspThrAspValTyrValValArgArgLysGlyHisIleSerLeu 

ACCCCATTAACATTAGACACAACAATTAAAAATTTAGAGGAATTTAAGAAAAAATATGAG 
241 ThrProLeuThrLeuAspThrThrlleUysAsnLeuGluGluPheLysLysLysTyrGlu 

AGAATATTAAATGAATGA 
261 ArglleLeuAsnGluEnd 266 



WO 97/48416 PCT/US97/10784 

3/11 

FIGURE 3 



Thermococcus alcaliphiXus ABDII12RA Phosphatase (18A) 
Complete Gene Sequence 



ATGATGATGGAATTCACTCGCGAGGGAATAAAAGCTCCTGTAGAGGCACTTCAAGGGTTA 
1 MetMetMetGluPheThrArgGluGlylleLysAlaAlaValGluAlaLeuGlnGlyLeu 

GGAGAGATCTACGTAGTTGCCCCAATGTTTCAAAGGAGCGCAAGTGGAAGGGCAATGACC 
21 GlyGluIleTyrValValAlaProMetPheGlnArgSerAlaSerGlyArgAlaMetThr 

ATCCACAGACCTGTAAGGGCTAAAAGAATAAGTATGAACGGTGCAAAAGCAGCCTATGCT 
41 IleHisArgProLeuArgAlaLysArglleSerMetAsnGlyAlaLysAlaAlaTyrAla 

TTGGATGGAATGCCCGTTGATTGCGTTATCTTTGCCATG^ 
61 LeuAspGlyMetProValAspCysValllePheAlaMetAlaArgPheGlyAspPheAsp 

CTTGCAATAAGTGGTGTAAACTTGGGAGAAAACATGAGCACCGAGATAACGGTTTCCGGG 
81 LeuAlalleSerGlyValAsnLeuGlyGluAsnMetSerThrGluIleThrValSerGly 

ACTGCAAGCGCTGCAATAGAGGCTGCAACCCAAGAGATCCCAAGC^ 
101 ThxAlaSerAlaAlalleGluAlaAlaThrGlnGluIleProSerlleProIleSerLeu 

GAAGTTAATAGAGAAAAACACAAATTTGGTGAGGGCGAAGAGATTGACTTCTCA 
121 GluValAsnArgGluLysHisLysPheGlyGluGlyGluGluIleAspPheSerAlaAla 

AAGTATTTCCTAAGAAAAATCGCAACGGCGGTTTTAAAGAGAGGCCTCCCCAAAGGA . 
141- LysTyrPheLeuArgLysIleAlaThrAlaValLeuLysArgGlyLeuProLysGlyVal 

GATATGCTGAACGTGAACGTCCCTTATGATGCAAATGAAAGGACAGAGATAGCTTTTAGT 
161 AspMetLeuAsnValAsnValProTyrAspAlaAsnGlxiArgThrGluIleAlaPheThr 

CGCCTX^CAAGAAGGATGTATAGGCCTTCTATTGAAGAGCGCATAGACCCAAAGGGGAAT 
181 ArgLeuAlaArgArgMetTyrArgProSerlleGluGluArglleAspProLysGlyAsn 

CCCTACTACTGGATAGTTGGAACTCAGTGCCGTAAGGAGGCATTAGAGCCGGGAACGGAT 
201 - ProTyrTyrTrpI 1 eValGlyThrGlnCys ProLysGluAl aLeuGluProGlyThr Asp 

ATGTATGTAGTTAAAGTTGAGAGAAAAGTTAGCGTGACTCCAATAAACATTGATATGACA 
221 MetTyrValValLysValGluArgLysValSerValThrProIleAsnlleAspMetThr 

GCAAGAGTGAATTTAGACGAGATTAAAAGACTTTTAGAACTGTAG 
241 AlaArgValAsnLeuAspGluIleLysArgLeuLeuGluLeuEnd 255 
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FIGURE 4 



Thermococcus celer Phosphatase (25A1A) 
Complete Gene Sequence 



ATGAGAACCCTGACAATAAACACTGACGCGGAGGGGTTCGTTTTGAGGATTCTCCTGACG 
1 MetArgThrLeuThrlleAsnThrAspAlaGluGlyPheValLeuArglleLeuLeuThr 20 

AACGACGATGGAATCTACTCCAACGGACTGCGCGCCGCTGTGAAAGCCCTGAGTGAGCTC 
21 AsnAspAspGlylleTyrSerAsnGlyLeuArgAlaAlaValLysAlaLeuSerGluLeu 40 

GGCGAAGTTTACGTCGTTGCCCCCCTCTTCCAGAGGAGCGCGAGCGGCAGGGCCATGACG 
41 GlyGluValTyrValValAlaProLeuPheGlnArgSerAlaSerGlyArgAlaMetThr 60 

CTCCACAGGCCGATAAGGGCCAAGCGCGTTGACGTTCCCGGCGCAAAGATAGCCTACGGA 
61 LeuHisArgProIleArgAlaLysArgValAspValProGlyAlaLysIleAlaTyrGly 80 

ATAGATGGAACTCCTACTGACTGCGTGATTTTCGCCAT 
Bl IleAspGlyThrProThrAspCysValllePheAlalleAlaArgPheGlySerPheGly 100 

TTAGCCGTGAGCGGGATTAACCTCGGCGAGAACCTGAGCACCGAGATAACAGIXTTCAG^ 
101 LenAlaValSerGlylleAsnLeuGlyGluAsnLeuSerThrGluIleThrValSerGly 120 

ACGGCCTCCGCTGCCATAGAGGCCTCAACTCATGGAATTCCGAGCATAGCGATTAGCCTT 
121 ThrAlaSerAlaAlalleGluAlaSerThrHisGlylleProSerlleAlalleSerLeu 14 0 

GAGGTGGAGTGGAAGAAGACCCTCGGCGAGGGTGAGGGGGTTGACTTC 
141 GluValGluTrpLysLysThrLeuGlyGluGlyGluGlyValAspPheSerValSerThx 160 

CACTTCCTCAAGAGAATCGCGGGAGCCCTCTTGGAGAGAGGTCTTCCTGAGGGCGTTGAC 
161 HisPheLeuLysArglleAlaGlyAlaLeuLeuGluArgGlyl^uProGluGlyValAsp 180 

ATGCTCAACGTCAACGTTCCGAGCGACGCGACGGAGGAAACGGAGATAGCAATCACCCGC 
181 MetLeuAsnValAsnValProSerAspAlaThrGluGluThrGluIleAlalleThrArg 200 

TTAGCCCGGAAGCGCTACTCCCCAACGGTCGAGGAGAGGATTGACCCCAAGGGCAACCCC 
2 01 LeuAlaArgLysArgTyrSerProThrValGluGluArglleAspProLysGlyAsnPro 22 0 

TACTACTGGATTGTCGGCAAACTTGTCCAAGACTTCGAGCCAGGGACAGATGCCTACGCC 
221 TyrTyrTrpIleValGlyLysLeuValGlnAspPheGluProGlyThrAspAlaTyrAla 24 0 

CTGAAGGTCGAGAGGAAGGTCAGCGTCACGCCGATAAACATAGATATGACTGCGAGGGTG 

241 LeuLy s Va 1 Gl uArgLys Va ISerVaTf hr ProlTeAsnTTeAs pMe tTHYAl aTArg Va 1 260' 



261 



GACTTTGAGGAGCTTGTAAGGGTTCTGTGGGTGTAA 
AspPheGiuGluLeuV*lArgValLeuTrpVal End 



272 



WO 97/48416 



PCT/US97/10784 



5/11 ^ f 

FIGURE 5 A 



Thermococcus GU5L5 Phosphatase (26A1A) 
Complete Gene Sequence (Part I of 2) 

ATGAAAGGAAAGTCTCTTCTTAGCGGTCTG 
1 MetLysGlyLysSerLeuValSerGlyLeuLeuLeuGlyLeuLeuIleLeuSerLcuIle 20 = 

TCATTCCAGCCAAGCTTTGCATACTCCCCACACGGCGCTGTCAAAAACATCATAATCCTG 
21 SerPheGlnProSerPheAlaTyrSerProHisGlyGlyVolLysAsnllellelleLeu 4 0 

GTTGGAGACCTCCATGGCTCTTGGGCATCrrAGAAATT^^ 
41 ValGlyAspGlyMetGlyLeuGlyHisValGluIleThrLysi^uValTyrGlyHisLeu 60 

AACATGGAAAACTTTCCAGTTACTGGATTTGAGCTTACTGAT^ 
61 AsnMetGluAsnPheProValThrGlyPheCluLeuThrAspSerLeuSerGlyGluVal 80 

ACAGATTCTGCTGCGGCAGGAACTCXIAATATCCACK^AGCTAAAACCT 
81 ThrAspSerAlaAlaAiaGlyThrAlalleSerThxGlyAlaLysThur^VrAsnGlyMet 100 

ATTTCAG TAA CC AACATAACCGG AAAGATAGTT AACTTAAC AAC CCT ACTTGAAGTGGCT 
101 ileSerVaXThrAsnlleThrGlyLysIleValAsnLeuThrrhrLeuLeuGluValAla 120 

C AAGAGCTTGGGAAGTCAACAGGGCTGGTCACCACAACAAGGATTACCCATGCAACTC 
121 GlnGluLeuGlyLysSerThrGlyLeuValThrThrThrArglleThrKisAlaThrPro - 140 

GCAGTTTTTGCGTCCCATGTCCCAGATAGGGATATGGAGGGGGAGAT^ 
141 'AlaValPheAlaSerHisValProAspArgAspMetGluGlyGiuIleProLysGlnLeu 160, 

ATAATGCACAAAGTTAACGTCTTGTTGGGTGGTGGAAGGGAGAAATTC 
161, IleMetHisLysValAsnValLeuLeuGlyGlyGlyArg<;iuLysPheAspGluLysAsn 180 

TTGGAGCTGGCCAAAAAGCAGGGATACAAAGTAGTTTTCACC^ 
181 LepGluLeuAlaLysLysGlnGlyTyrLysValValPheThrLysGluGluLeuGluLys 200 

G TTG AAGG AGATT ATGTCCT AGG A CTCTTTGC AG AAAGTCACATC G GTTACGTATTGGAT . 
201 ValGluGlyAspTyrValLeuGlyLeuPheAlaGluSerHisIleProTyrValLeuAsp 220 

AGAAAACCCGATGATGTTGGACTTTTAGAAATKXXTCAAAAAGGCAATTTC 
221 ArgLysProAspAspValGlyLeuLeuGluMetAlaLysLysAlalleSerlleLeuGlu 240 

AAGAACCCGAGCGGATTCTTTCTCATGGTTGAGGGCGGA 
241 LysAsnProSerGlyPhePheLeuMetValGluGlyGlyArglleAspHisAlaAlaHis ' 260 

GGAAACGATGTCGCATCGGTTGTTGCAGAAACTAAGGAGTTTGACGATGTTGTCAGATAC 
261 GlyAsnAspValAlaSerValValAloGluThrLysGluPheAspAspValValArgTyr 2B0 

GTGCTGGAATATCCGAAGAAGAGGGGAGATACCTTGGTAATAGTGCTTGCCGATCACGAA 
281 ValLeuGluTyrProLysLysArgClyAspThrLeuVallleValLeuAlaAspHisGlu 300 

ACTGGAGCTCTTGCAATACCTCTAACCTATGGAAATGCAATCGATCAAGATGCCATAAGA 
30 V ThrGlyGlyLeuAloI lo<;j yLeoThrTy rGl yAsnAlal leAspGluAspAla I leArg 320 

AAAATAAAAGCAACCACXI'nKlACKIATGCCCAAAGAGGTTAAGGCAGGGACTAGTGTAAAA 
\2 I Lysl leLysAlaSnrThi LouAiyMet Pr oLysCluVa 1 LysAl aCl ySerSerVa 1 Lys 340 
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Thermococcus GU5L5 Phosphatase (26A1A) 
Complete Gene Sequence (Pan 2 of 2) 



GACTCCTCAAAGGTATGCCGGATTTGTCCCAACAGAGGAAGAAGTCAGTATATTGAGAAT 
341 GluSerSerLysValCysArglleCysProAsnArgGlyArgSerGlnTyrlleGluAsn 360 

GCGCTGCACTCGACAAACAAGTATGCCCTCTCAAATGCAGTAGCCGATGTTATAAACAGG 
361 AlaL^uHisSerThrAsnLysTyrAlaLeuSerAsnAlaValAlaAspVallleAsnArg 3 80 

CGTATTGGTGTTGGATTCACCTCCTATGAGCATACAGGAGTTCCAGTTC 
381 ArglleGlyVaZGlyPheThrSerTyrGluHisThrGlyValProValProLeuLeuAla 4 00 

TACCWTCCCGGGGCAGAGMCITCAGAGGrmrrrrA 
401 TyrGlyProGlyAlaGluAsnPheArfirGlyPheLeuHisHisValAspThrAlaArgLeu 4 20 

GTTGCAAAGTTAATGCTCTTIWAAGGAGGAATATTCCAGT^ 
421 ValAlaLysLeuWetLeuPheGlyArgArgAsnlleProValThrlleSerSerValSer 440 

AGTGTTAAGGGAGACATAACCGGTGATTACAGGGTTGATGAGAAGG^ v 
441 SerV&lLysGlyAspIleThrGlyAspTyrArgValAspGluLysAspAlaTyrValThr 460 4 

CrrCATGATGrrTCTCGGAGAAAAAGTGGATAATGAAAT^ 
461 LeuWetMetPheLeuGlyGluLysValAspAsnGluIleGluLysAxgValAspIleAsp 4 BO 1 

AACAACGGCATGGTTGACTTAAATGACGTCATGTTGATTCTCCAGGAA 
481 AsriAsnGlyMetValAspLeuAsnAspValMetLeuIleLeuGlnGluAlaEnd 498 
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FIGURE 6A 



0C9a Phosphatase (27A3A) 
Complete Gene Sequence (Part 1 of 2) 

ATGCCAAGAAATATCCCCGCTCTATGCGCCCTGGCCGCTTTGTTAGGGTCCGCCTGGGCG 
1 MecProArgAsnlleAlaAlaValCysAlaLeuAJLaAlaLeuLeuGlySerAlaTrpAla 20 

GCCAAAGTtCCCGTCTACCCCTACGACGGAGCCGCTTTCCTGGCGGGGCAGCG 
. 21 AlaLysValAlaValTyrProTyrAspGlyAlaAlaLeuLeuAlaGlyClnArgPheAsp 40 

TTGCGC ATAGAAGCCTCCGAGCTG AAAGGC AATTTAAAGGCTT ACCGCATC AGCCTGGAC 
41 LeuArglleGluAlaSerGluLeuLysGlyAsnLeuLysAlaTyrArglleThrLeuAsp 60 

GGCCAGCCTCTGGCGGGCCTCGAGCAAACCGCGCAGGGGGCCGGGC AGGCCGAGTOT . 
61 GlyGlnProLeuAlaGlyLeuGluGlnThrAlaGlnGlyAlaGlyGlnAlaGluTrpThr 80 

CTGCGCGGTGCCTTCCTCCGCCCTGGAAGCCACACCCTCGAC&TCAK 
81 LeuArgGlyAlaPheLeuArgProGlySerHisThrLeuGluValSerLreuThrAspAsp 100 

GCTGGGGAGAGCAGGAAGAGCGTACGTTGGGAGGCTCGGCAGAACC^rTCGCTTGCCCC^ 
101 AlaGlyGluSerArgLysSerValArgTrpGluAlaArgGlnAsnLeuArgLeuProArg 120 

GCGGCCAAGAATGTGATTCTCTTCATTGGCGACGGGATGGGCTG^ 
121 AlaAlaLysAsnValllel^uPhoIleGlyAspGlyMetGlyTrpAsnThrLeuAsnAla 140 

GCCCGCATCATCGCCAAAGGCTTTAACCCCGAAAACGGTATGCCCAACtX»AAACCTCGAG 
141 AlaArgllelleAlaLysGlyPheAsnProGluAsnGlyMetProAsnGlyAsnLeuGlu 160 

ATCGAGAGTGGTTACGGTGGGATGGCTACCGTCACTACCGGCAGCTTTGATAGCT 
161 HeGluSerGlyTyrGlyGlyMetAlaThrValThrThrGlySerPheAspSerPhelle 180 

GCCGACTCAGCTAACTCGGCTTGTTCCATGATGACCGGGCAGAAGGTGCAGGTGAATGCC 
181 AlaAspSerAlaAsnSerAlaSerSerlleMetThrGlyGlnLysValGlnValAsnAla 200 

CTCAACGTTTACCCATCAAACCTCAAAGATACCCTGGCCTACCCCCGGATCGAAACCCTA 
201 LeuAsnValTyrProSerAsnLeuLysAspThrLeuAlaTyrProArglleGluThrLeu 220 

GCGGAGATGCTCAAGCGGGTACGCGGGGCCAGCATTGGGGTAGTGACCACCACCTTCGGC 
221 AlaGluMetLeoLysArgValArgGlyAlaSerXleGlyValValThrThrThrPheGly 240 

ACCGACGCTACCCCGGCTTCACTCAACGCCCATACCCGCGCCCGCGGTGATTACCAGGCT 
241 ThrAspAlaThrProAlaSerLeuAsnAlaHisThrArgArgArgGlyAspTyrGlAAla 260 

ATCGCCGACATGTACTTTGGTAGAGGCGGGTTCGGTGTTCCCTTGGATGTGATGCTCTT^ 
261 IleAlaAspMetTyrPheGlyArgGlyGlyPheGlyValProLeuAspValMetteuPhe 280 

GGTGGTTCACGCGACTTCATCCCCCAGAGCACCCCTGGCTCCCGGCGCAAGGATAGCACG 
281 GlyClySefArgAspPhelleProGlnSerThrProGlySerArgArgLysAspSerthr 300 

GACTGGATTCCCCAATCCCACAAGCTGCCCTACACCTTnrTCACCACCCGCACCGAGCTG 
301 AspTrpI ieAlaCluSerClnLysLeuniyTyrThrPhoValSotThrArgSerGiuLeu 320 

C TXX ICGC CC AAACC C ACCC I AT A AGC TX ITTTGC IC CTCTTC AA C ATTCJ AC A A CTTC CCC AGC 
321 LeuAlaAlaLysProThrAr.pl..ysl.eijPlwsClyl.ouPriiiAHnI 1 eAspAsnPheProSer 34 0 
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FIGURE 6B 



0C9a Phosphatase (27A3A) 
Complete Gene Sequence (Part 2 of 2) 



TACCTAGACCGCGCACTGTGGAAGCGGCCCGAGATGCTGGGAAGCTTTACCGATATGCCC 
341 TyrLeuAspArgAlaValTrpLysArgProGluMetLeuGlySerPheThrAspMecPro 

TACCTCTGGGAGATGACCCAGAAAGCCGTGGAGGCTCTCTCCAGAAACGACAAAGGCTTT 
361 TyrLeuTrpGluMetThrClnLysAlaValGluAldLeuSerArgAsnAspLysGlyPhe 

TTCTTGATC^TTGAGGGGCCAATGGra 
381 PheLeuMetValGluGlyGlyMetValAspLysTyrGluHisProLeuAspTrpProArg 

GCACTTTGGGATGTACTCGAGCTGGACCGCGCGGTGGCTTGGGCCAAGGGCTATG^ 
401 AlaLeuTrpAspVall^uGlul^uAspArgAlaValAlaTrpAlaLysGlyTyrAlaAla 

TCCCACCCCGATACCCTC^TGATTXTTCACCGCCGACCACGCTC 
421 SerHisProAspThrLeuVallleValThrAlaAspHisAlaHisSerlleSerValPhe 

GGCCXOTACC^CTACTCCAAGCAC^ 
441 GlyGlyTyrAspTyrSerLysGlnGlyArgGluGlyValGlyValTyrtluAlaAlaLys 

TTCCCCACCTACGGCGACAAAAAAGACGCCAAGGGCTTTCCCTTGCCCGACACCAC^ 
461 PheProThrTyrGlyAspLysLysAspAlaAsnGlyPheProLeuProAspThJ^ThrArg 

GGAATCGCGGTAGGCTTCGGGGCCACGCCGGATTACTGTGAAACCTACCGGGGCCGCGAG 
481 GlylleAlaValGlyPheGlyAlaThrProAspTyrCysGluThrTyrArgGlyArgGlu 

gtctacaaagaccccaccatctccgacggcaaaggtggttacgt6gccaaccctg^ 

501 ValTyrLysAspProThrlleSerAspGlyLysGlyGlyTyrValAlaAsnProGluVal 

" TGCAAGGAGCCGGGCCTTCCAACGTAtCGGCAACTCCCAGTAGATAGCGCCCAGGGCGTG 
521 CysUysGluProGlyLeuProThrTyrArgGlnLeuProValAspSerAlaGlnGlyVal 

CACACGGCTGATCCCATGCCGCTGTTTGCCTTTGG 
541 HisThrAlaAspProMetProLeuPheAlaPheGlyValGlySerGlnPhePheAsnGly 

" CTCATCGACCAGACCGAGATCTTCTTCCGCATGGGGCAGGGGCTAGGGTTGAAGCCCCAC- 
561 LeulUAspGlnThrGluIlePhePheArgMetAlaGlnAlaLeaGlyPheAsnProHiS 

CTCGAGAAGCCTTAA 
■581 LeuGluLysProEnd 585 



360 
380 
400 
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FIGURE 7 



Mil TL Phosphatase (29A1A=2 9A2A) 
Complete Gene Sequence 



20 



60 



80 



100 



ATGTATAAATGGATTATTGAGGGTAAGCTTGCCCAAGCACCTTTTCC 
I MetTyrLysTrpIlelleGluGlyLysLeuAlaGlnAlaProPheProSerLeuGlyGlu 

• CTAGCCGATCTCAAAAGACTTTTGGACGCCATTATTGTTCTTACAATGCCGCATG 1 
21' LeuAlaAspLeuLysArgLeuPheAspAlallelleValLeuThrMetProHisGluGin ' 40 

CCGCTTAATGAGAAATATATCGAGATATTAGAGAGCCATGGATTCCAAGTCCTCCATCTC 
41 ProLeuAsnGluLysTyrlleGluIleLeuGluSerHisGlyPheGlnValLeuHisVal 

CCCACGCTCGACT^CATCCTTTAGAACT^ 
' "61* proThrLeuAspPheHisProLeuGluLeuPheAspLeuLeuLysThrSerllePhelle 

GATGAAAACCTGGAGAGATCCCACAGAGTGCTTGTCCACTGCA 
' 81 AspGluAsnLeuGluArgSerHisArgValLeuValHisCysMetGlyGXyXleGlyArg ■ 

AGGGGGCTfGTAACTGCTGCGTACTTAATATTCAAAGG 
101 SerGlyLeuValThrAlaAXaTyrLeuIlePheLysGlyTyrAspIleTyxAspAlaVal 120 

AAGCATGTGAGAACGGTAGTGCCTGGTGCTATTGAAA^ 
121 LysHisVaiArgThrVa-lValProGl^ 140 

' GAGAACTACTATACCCTGGTCAAAAGTTTCAACAGA 
141 GluAsnTyrTyrThrLeuValLysSe^ 160 

. AAAATTTTCACGCTCCKjTGACCCGAAGGCGGTTCTCCACGCTT 
161. LysIlePheThrLeuGlyAspProLysAlaValLeuHisAlaSerLysThrThrGlnPhe 

' ACGATTSAACTCTTAAGCAACTTACACGTC^ 
181 Thrl^^ 

CAATCACTGCTGCAGTTTCACGACGTAAAAGTCCGCTCTAAACTGAAAGAAGT^ 
201- GlnSerLeuLeuHisPheHisAspVaaLysValArgSerLysLeuLysGluValPheGlu 220 

AACATGGAATTCTCATCCGCCTCAGAGGAGGTTCTGTCATTTATTCACCTACTCGA 
221 AsnWetGluPheSerSerAlaSerGluGluValLeuSerPheileHisLeuLeuAspPhe 240 

TATCAGGATGGCAGGGTTGTTTTAACCATTTACGATTATGTCCCCG 
241 TyrGlnAspGlyArgValValLeuThrlleTyrAspTyrLeuProAspArgVaiAspLeu ■ 260 

ATTTTATTGTGTAACTGGGGTTCrrGATAAAATAGTTGAACTGTCGTC^ 
261 ' ileLeuLeuCysLysTrpGlyCysAspLysIleValCLuValSerSerSerAlaLysLys 2H0 

acccttgacaagcttgtacgaagaaaggt^ 

2ft 1 ThrVaU;JuL.y2t.PuV^IC;lyArgLysValSerl. ) Mi.'; l ,tTrpALaAsnTyrLeuAs P Tyi 

gth'ac; 

101 ValKnd \t)2 
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FIGURE 8 



Thermococcus CL-2 Phosphatase (30A1A) 
Complete-Gene-Sequence 



ATGAGAATCCTCCTCACCAACGACGACGGCATCTATTCCAACGGTCTGCGCGCGGCGGTG 
1 MetArglleLeuLeuThrAsnAspAspGlylleTyrSerAsnGlyLeuArgAlaAlaVal 20 



AAGGGCCTGAGCGAGGTCGGCGAGGTCTACGTCGTCGCCCCGCTCTTCCAGAGGAGCGCG 
21 LysGlyLeuSerGluLeuGlyGluValTyrValValAlaProLeuPheGlnArgSerAla 

AGCGGTCGGGCGATGACCCTACACAGGCCGATAAGGGCAAAGAGGGTTGACGTTCCCGGC 
41 SerGlyArgAlaMetThrLeuHisArgProIleArgAlaLysArgValAspValProGly 

GCGAAGATAGCGTATGGCATAGACGGAACGCCGACCGACTGCGTGATTTTTGCCATCGCC 
61 AlaLysIleAlaTyrGlylleAspGlyThrProThrAspCysValllePheAlalleAla 



40 



50 



CGCTTCGGCGACTTTGATCTGGCGGTCAGCGGGATAAACCTAGGCGAGAACCTGAGCACG 
81 ArgPheGlyAspPheAspLeuAlaValSerGlylleAsnLeuGlyGluAsnLeuSerThr 100 

GAGATAACCGTCTCCGGAACGGCCTCGGCGGCGAT^ 
101 GluIleThrValSerGlyThrAlaSerAlaAlalleGluAlaSerThrHisGlyllePro 120 

AGTGTAGCTATAAGCCTCGAGGTCGAGTGGAAGAAGACCCTCGGCGAGGGGGAGGGTATT 
121 • SerValAlalleSerLeuGluValGluTrpLysLysThrLeuGlyGluGlyGluGlylle 140 

GACTTCTCGGTTTCAGCACACTTCCTGAGAAGGATAGCGACGGCTGTCCTTAAGAAGGGC 
141 AspPheSerValSerAlaHisPheLeuArgArglleAlaThrAlaValLeuLysLysGly 160 

CTGCCTGAAGGGGTGGACATGCTCAACGTGAACGTCCCTAGCGACGCCAGCGAGGGGACT 
161 LeuProGluGlyValAspMetLeuAsnValAsnValProSerAspAlaSerGluGlyThr 180 

GAGATCGCCATAACGCGCCTGGCGAGGAAGGGCTATTCTCCGACGATAGAGGAGAGGATA - 
181 GluIleAlalleThrArgLeuAlaArgLysArgTyrSerProThrlleGluGluArglle 200 

GACCCCAAGGGCAACCCCTACTACTGGATCGTTGGCAGGCTCGTCCAGGAGTTCGAGCCG 
201 AspProLysGlyAsnProTyrTyrTrpIleValGlyArgLeuValGlnGluPheGluPro 220 

GGCACGGACGCCTACGCTCTGAAApTCGAGAGAAAGGTCAGCGTCACGCCCATAAACATC 
221 GlyThrAspAlaTyrAlaLeuLysValGluArgLysValSerValThf ProIleAsnlle 240 

GACATGACTGCGAGGGTTGACTTTGAGAACCTTCAAAGGCTTCTGAGCCTGTGA 
241 AspMetThrAlaArgValAspPheGluAsnLcuGlnArgLeuLeuSerLeuEnd 258 
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FIGURE 9 

Aguifex VF-5 Phosphatase (34A1A) 
Complete Gene. 'Sequence 

1 ' ■ " ATGGAAAACTTAAAAAAGTACCn'AGAAGTTGCAAAAATAGCCGCGCTCGCGGGTGGGCAG 
1 MecGluAsnLeuLysLystyrLeuGluValAlaLysneAlaAlaLeuAlaGlyGlyGln 

GTTCTGAAAGAAAACTTCGGAAAGGTAAAAAAGGA^ 
21 ' ValLeuLysGluAsnPheGlyLysValtysLysGluAsnlleGluGluLysGlyGluLys- 

GACTTTGTAAGTTACGTGGATAAAACTTCAGAGGAAAGGATAAAGGAGGTGATAGTCAAG 
41- AspPheValSerTyryalAspLysThrSerGluGluArglleLysGluVallleLeuLys 

' TTCT^CCCGATCACGAGGTCGTAGGGGAAGAGATGGGTGCGGAGGGAAGCGGAAGCGAA 
.61- PhePheProAspHisGluValValGlyGluGluMetGlyAlaGluGlySerGlySerGlu 

TACAGGTGGTTCATAGACCCCCTTGACGGGACAAAGAACTACATAAACGGTTTTCCCATC 
81 TyrArgTrpPherieAspPrpLeuAspGlyThrLysAsn-IVrlleAsnGlyPheProIle 

TTTGCCGTATCAGTGGGACTTGTTAAGGGAGAAGAGCCAATTGTGGGTC 
101 PheAlaValSerVal'GlyLeuValLysGlyGluGluProIleValGlyAlaValTyrLeu 120 _ 

CCTTACTrrcACAAGCTTTACTGGGGTGCTAAA^TCT^ 
121 ProTyrPheAspLysLeuTYrTrpGlyAlaLysGlyLeuGlyAlaTyrValAsnGlyLys 

AGGATAAAGGTAAAGGACAATGAGAGTTTAAAGCACGCCGGAGTGGflTTACGGATTTCCC 
141 ASI^XsAspAsnGluSerLeuLysHisAlaGlyValValTyrGlyPhePro 

TCTAGGAGCAGGAGGGACATATCTATCTACTTGAACAT^ 
161 SerArgSerArgArgAspIleSerlleTyrLeuAsnllePheLysAspValPheTyrGlu 

'GTTGGCTCTATGAGGAGACCCGGGGCTGCTGCGGTTGACCTCTGCATGGTGGCGGAAGGG 
181 ValGlySerMetArgArgProGlyAlaAlaAlaValAspLeuCysMetValAlaGluGly 

ATATTTGACGGGATGATGGAGTTTGAAATGAAGCCGTGGGACATAACCGCAGGGCTTGTA 
201 llePheAspGlyMecMecGluPheGluMetLysProTrpAspIleThrAlaGlyL.euVal 

■ ATACTGAAGGAAGCCGGGGGCGTT-TACAC^CTTGTGGGAGA^ 
22l\ ileLeuLysGluAlaGlyGlyValTyr^ 

ATAATTGCGGGCAACAAAGCCCTCCACGACT^ATACTTCAGGTAGCCAAAAAGTATATG 
241 IlelleAlaGlyAsnLysAlaLeuHisAspPhelleUeuGlnValAlaLysLysTyrMeC 
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GAAGTGGCGGTGTGA 
2G1 GluValAlaValEnd 26b 
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